Skip to content

MrTrenchTrucker/contak-browser-skills

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONTAK Browser Skills for AI Agents

CONTAK browser skills cover

Two drop-in skills that let an AI agent control a browser:

  • Headless screenshot -- quick PNG of a URL via a Browserless Chromium service. No login, no session, no GUI. One-shot.
  • Interactive browser -- full persistent browser via KASM Chrome + Chrome DevTools Protocol. Real cookies, real session state, real synthetic events. Use this when a flow needs login state, captchas, or human-style interaction.

Use one, use both, or use neither -- the skills are independent.

What's in here

.
+-- README.md                                      <- you are here
+-- assets/
|   \-- cover.png
+-- skills/
|   +-- browser-screenshot/SKILL.md                <- drop-in slash-skill, headless
|   \-- kasm-browser/SKILL.md                      <- drop-in slash-skill, interactive
\-- sops/
    +-- SOP_Headless_Browser_Screenshot.md         <- full setup + cross-host + troubleshooting
    \-- SOP_Interactive_Browser_via_KASM_CDP.md    <- same, for the interactive case (incl. Human-Speed Rule)

Quick start

For the headless screenshot skill

docker run -d --name browser-headless --restart unless-stopped \
  -p 30001:3000 \
  ghcr.io/browserless/chromium:latest

curl -s -X POST http://localhost:30001/screenshot \
  -H 'Content-Type: application/json' \
  -d '{"url":"https://example.com","options":{"type":"png"},"viewport":{"width":1920,"height":1080}}' \
  -o /tmp/screenshot.png

Then drop skills/browser-screenshot/SKILL.md into your agent's skill directory.

For the interactive KASM skill

docker run -d --name kasm-chrome --restart unless-stopped \
  -e CHROME_CLI="--remote-debugging-port=9222 --remote-debugging-address=0.0.0.0" \
  -p 6901:6901 \
  kasmweb/chrome:latest

docker exec kasm-chrome curl -s http://localhost:9222/json/version

Then drop skills/kasm-browser/SKILL.md into your agent's skill directory.

Read the matching SOP in sops/ for the full setup, cross-host setup, common recipes, and troubleshooting.

How to adapt to your project

The two skills are intentionally generic. To turn them into a team-specific SOP:

  1. Replace placeholders (<host>, <port>, <container>, <kasm-host>, <local-path>, <target-url>) with your environment's values.
  2. Add team-specific context -- who maintains the container, escalation contacts, common URLs you screenshot or automate against.
  3. If you have policies on which sites are OK to automate (Terms-of-Service, allowlists), document them at the top.
  4. Add representative examples drawn from your real workflows.

Each SOP file in sops/ has a dedicated chapter for this -- see "Generating an SOP from This Skill."

Choosing between the two

Use case Skill
Single screenshot of a static URL, no login browser-screenshot (headless)
Visual smoke-test of your own dashboard browser-screenshot
Login flow with captcha or 2FA kasm-browser (interactive)
Multi-step flow where session cookies matter kasm-browser
React app needing real synthetic events kasm-browser
Scraping behind a logged-in UI kasm-browser

Responsible use

  • Always check the target site's Terms of Service before automating against it.
  • The skills are intended for sites you legitimately have access to -- your own dashboards, vendors where you have an account, services you operate.
  • Apply the Human-Speed Rule (described in the KASM SOP, chapter 4) any time you drive a real browser. Bot-detection walls are real, and slow pacing is the cheapest way through them.
  • Never log secrets, OAuth tokens, session cookies, or passwords into chat surfaces or persistent logs.

License

MIT -- use it, fork it, ship it. See LICENSE for the full text.

Credits

These skills were extracted from the internal browser-tooling SOPs of the CONTAK fleet -- a heterogeneous AI agent system running on a single home server. The original docs were heavily fleet-specific; this public version is scrubbed and made generic so anyone with a similar use case can drop the skills in.


If you build something cool with these, drop a note. Issues + PRs welcome.

About

Drop-in browser skills for AI agents: headless screenshot (Browserless) + interactive control (KASM Chrome via CDP). Generic, scrubbed, MIT-licensed.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors