Two drop-in skills that let an AI agent control a browser:
- Headless screenshot -- quick PNG of a URL via a Browserless Chromium service. No login, no session, no GUI. One-shot.
- Interactive browser -- full persistent browser via KASM Chrome + Chrome DevTools Protocol. Real cookies, real session state, real synthetic events. Use this when a flow needs login state, captchas, or human-style interaction.
Use one, use both, or use neither -- the skills are independent.
.
+-- README.md <- you are here
+-- assets/
| \-- cover.png
+-- skills/
| +-- browser-screenshot/SKILL.md <- drop-in slash-skill, headless
| \-- kasm-browser/SKILL.md <- drop-in slash-skill, interactive
\-- sops/
+-- SOP_Headless_Browser_Screenshot.md <- full setup + cross-host + troubleshooting
\-- SOP_Interactive_Browser_via_KASM_CDP.md <- same, for the interactive case (incl. Human-Speed Rule)
docker run -d --name browser-headless --restart unless-stopped \
-p 30001:3000 \
ghcr.io/browserless/chromium:latest
curl -s -X POST http://localhost:30001/screenshot \
-H 'Content-Type: application/json' \
-d '{"url":"https://example.com","options":{"type":"png"},"viewport":{"width":1920,"height":1080}}' \
-o /tmp/screenshot.pngThen drop skills/browser-screenshot/SKILL.md into your agent's skill directory.
docker run -d --name kasm-chrome --restart unless-stopped \
-e CHROME_CLI="--remote-debugging-port=9222 --remote-debugging-address=0.0.0.0" \
-p 6901:6901 \
kasmweb/chrome:latest
docker exec kasm-chrome curl -s http://localhost:9222/json/versionThen drop skills/kasm-browser/SKILL.md into your agent's skill directory.
Read the matching SOP in sops/ for the full setup, cross-host setup, common recipes, and troubleshooting.
The two skills are intentionally generic. To turn them into a team-specific SOP:
- Replace placeholders (
<host>,<port>,<container>,<kasm-host>,<local-path>,<target-url>) with your environment's values. - Add team-specific context -- who maintains the container, escalation contacts, common URLs you screenshot or automate against.
- If you have policies on which sites are OK to automate (Terms-of-Service, allowlists), document them at the top.
- Add representative examples drawn from your real workflows.
Each SOP file in sops/ has a dedicated chapter for this -- see "Generating an SOP from This Skill."
| Use case | Skill |
|---|---|
| Single screenshot of a static URL, no login | browser-screenshot (headless) |
| Visual smoke-test of your own dashboard | browser-screenshot |
| Login flow with captcha or 2FA | kasm-browser (interactive) |
| Multi-step flow where session cookies matter | kasm-browser |
| React app needing real synthetic events | kasm-browser |
| Scraping behind a logged-in UI | kasm-browser |
- Always check the target site's Terms of Service before automating against it.
- The skills are intended for sites you legitimately have access to -- your own dashboards, vendors where you have an account, services you operate.
- Apply the Human-Speed Rule (described in the KASM SOP, chapter 4) any time you drive a real browser. Bot-detection walls are real, and slow pacing is the cheapest way through them.
- Never log secrets, OAuth tokens, session cookies, or passwords into chat surfaces or persistent logs.
MIT -- use it, fork it, ship it. See LICENSE for the full text.
These skills were extracted from the internal browser-tooling SOPs of the CONTAK fleet -- a heterogeneous AI agent system running on a single home server. The original docs were heavily fleet-specific; this public version is scrubbed and made generic so anyone with a similar use case can drop the skills in.
If you build something cool with these, drop a note. Issues + PRs welcome.
