"You sleep. It ships."
You close your laptop at midnight. 47 TODOs in your backlog.
You open it at 8am. 38 of them are done, tested, committed, on a clean branch.
Total cost: $4.20. No meetings required.
That's autonomous.
A self-driving project agent for Claude Code.
Drop it into any git repo, run /autonomous, go to sleep.
Quickstart · Skills · Architecture · How It Works · Configuration · Safety · Testing
Requirements: Claude Code, Git, Python 3.9+
Optional: tmux (visible worker windows), jq (persona generation)
Paste this into Claude Code:
Install autonomous: git clone https://github.com/Sma1lboy/autonomous.git ~/.claude/skills/autonomous-skill && cd ~/.claude/skills/autonomous-skill && ./setup
That's it. Open any git repo and run /autonomous or /quickdo.
This package ships two public skills:
The complete pipeline: Conductor → Sprint Master → Worker. Runs multiple sprints, transitions between directed work and autonomous exploration, manages sprint branches, evaluates results between sprints.
# Default: 10 sprints
/autonomous
# Quick: 3 sprints
/autonomous 3
# With direction: focus on a specific area
/autonomous 5 build REST API
# Direction only (default 10 sprints)
/autonomous fix all auth bugsLightweight mode. Skips the conductor, runs one sprint master directly via blocking
claude -p. No tmux, no multi-sprint state, no monitor polling. One direction, one
sprint, done.
# Single task
/quickdo add login page with GitHub OAuth
# Quick fix
/quickdo fix the broken unit testsBest for tasks that fit in a single sprint — a full page, a complete feature stage, a test suite, a refactor.
# Direct CLI invocation via loop.py
AUTONOMOUS_DIRECTION="fix auth bugs" python3 scripts/loop.py /path/to/projectThree-layer hierarchy with full context isolation between layers:
Conductor (autonomous/SKILL.md — runs in user's Claude Code session)
│
├── Plans sprint directions (directed phase or exploration phase)
├── Dispatches sprint masters via claude -p
├── Evaluates sprint results, manages phase transitions
│
└── Sprint Master (SPRINT.md — separate claude -p session)
│
├── Sense → Direct → Respond → Summarize loop
├── Dispatches workers via claude -p
├── Answers worker questions via comms.json protocol
│
└── Worker (full Claude session with all tools)
│
└── Executes the actual work: reads code, edits files,
runs tests, commits changes
Each layer runs in its own Claude session — fresh context per sprint, no bleed between layers.
/quickdo flattens this to two layers: it skips the conductor and runs the sprint master directly.
Backlog — A persistent work queue (.autonomous/backlog.json) that survives across sessions. Workers log out-of-scope discoveries, the conductor decomposes large missions into deferred items. When exploration runs dry, idle sprints pick from the backlog. Progressive disclosure: sprint masters only see one-line titles, the conductor sees full descriptions.
Worker-task suggestions and boundary blacklists are driven by swappable templates at templates/<name>/template.md. Ships with two: gstack (default — uses /office-hours, /qa, /investigate, blocks /ship etc.) and default (generic, no toolchain commands).
Select a template per project by writing .autonomous/skill-config.json in your project root:
{ "template": "default" }The project-level override beats the skill-root default at ~/.claude/skills/autonomous-skill/skill-config.json. Unknown template names fall through to default. To add a new template, create templates/<name>/template.md with ## Allow and ## Block sections and point the config at it.
- Persona —
persona.pyreads your git history + project docs to understand your coding style. WritesOWNER.md. - Discovery — The conductor talks to you to understand the mission. If you passed a direction in args, it confirms and moves on.
- Session — Creates an
auto/session-TIMESTAMPbranch and initializesconductor-state.json. - Conductor loop — Plan → Dispatch → Monitor → Evaluate → Repeat:
- Directed phase: breaks your mission into sprint-sized tasks, dispatches one sprint master per task
- Phase transition: when direction is complete (2 consecutive signals + commits, max sprints reached, or 2 zero-commit sprints)
- Exploration phase: scans the project across 8 dimensions, picks the weakest, generates improvement sprints
- Sprint execution — Each sprint master gets a fresh
claude -psession, dispatches a worker, answers questions viacomms.json, and writessprint-summary.jsonwhen done. - Merge/discard — Successful sprints merge back to the session branch. Failed sprints are discarded.
- Backlog pickup — When exploration dimensions are all solid, the conductor checks the backlog for deferred work items before stopping.
- Session ends when all sprints are used up, the project feels solid, and the backlog is empty.
When the directed mission is complete, the conductor autonomously explores 8 dimensions:
| Dimension | What it audits |
|---|---|
test_coverage |
Untested code paths, missing edge cases |
error_handling |
Missing error messages, unhandled failures |
security |
Hardcoded secrets, injection vulnerabilities, input validation |
code_quality |
Dead code, duplication, overly complex functions |
documentation |
README accuracy, missing docstrings, stale docs |
architecture |
Module boundaries, dependency directions, separation of concerns |
performance |
N+1 queries, blocking I/O, missing caching |
dx |
CLI help text, error messages, setup instructions |
Dimensions are scored via fast Python heuristics (explore-scan.py), and the weakest is selected for each exploration sprint.
Workers can't use AskUserQuestion in subagent context. Instead, they write questions to .autonomous/comms.json:
{"status": "waiting", "questions": [{"question": "...", "options": [...]}], "rec": "A"}The sprint master polls, decides using product intuition (or OWNER.md guidance), and writes back:
{"status": "answered", "answers": ["A"]}Valid statuses: idle, waiting, answered, done.
Set AUTONOMOUS_WORKER_CAREFUL=1 to install a PreToolUse hook on every dispatched worker that blocks catastrophic Bash commands:
AUTONOMOUS_WORKER_CAREFUL=1 /autonomous 5 build REST APIBlocks: rm -rf /, rm -rf $HOME, rm -rf /Users|/home, mkfs, dd of=/dev/sd*, fork bombs, device redirects (>, >>, >|, tee, cp to /dev/*), shutdown/reboot, git push --force (all variants), DROP TABLE/DATABASE/SCHEMA, TRUNCATE TABLE. Also catches interpreter wrappers like python3 -c 'os.system("rm -rf /")' and chaining bypasses like echo ok; rm -rf /.
Configured per-sprint via claude --settings <file> — no global settings change. Blocks are exit-2 with a stderr message; the worker reads "BLOCKED: ..." and adapts.
Take a human-readable snapshot of the current session anytime:
python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py save .
python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py save . --title "pre-refactor"
python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py list .
python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py latest .Each checkpoint is a markdown file at .autonomous/checkpoints/<ts>-<slug>.md capturing mission, phase, sprint history, backlog summary, exploration dimension scores, git state, and resume guidance. History is retained — old checkpoints stay until manually deleted.
Useful for context switching ("where was I yesterday?"), sharing session state with a teammate, or reviewing sprint output before resuming.
Set AUTONOMOUS_SPRINT_WORKTREES=1 to run each sprint in its own git worktree under .worktrees/sprint-N/:
AUTONOMOUS_SPRINT_WORKTREES=1 /autonomous 5 build REST API- Main tree stays on the session branch the whole time — no
git checkout -bchurn - Each sprint works on its own branch in its own directory, fully file-isolated
.autonomous/is symlinked from each worktree back to the main tree, so comms, state, summaries, and backlog all go through one source of truth.worktrees/is auto-added to.gitignoreon first sprint- On sprint completion: merge runs first (with
--keep-branch), then the worktree is removed, then the branch is deleted. If merge conflicts, worktree and branch are preserved for forensic inspection. .worktrees/or.autonomous/pre-existing as symlinks are refused (repo-escape prevention).
V1 is serial-only — one sprint at a time. Parallel sprint dispatch is deferred to a future PR.
| Variable | Default | Description |
|---|---|---|
MAX_SPRINTS (via args) |
10 |
Max conductor sprints |
MAX_ITERATIONS |
50 |
Max iterations for loop.py standalone mode |
CC_TIMEOUT |
900 |
Timeout per CC invocation (seconds) |
AUTONOMOUS_DIRECTION |
(none) | Session focus (e.g., "fix auth bugs") |
MAX_COST_USD |
(none) | Stop when total cost exceeds this |
DISPATCH_MODE |
(auto) | blocking (no tmux), headless (background), or auto (tmux if available) |
autonomous-skill/
├── autonomous/SKILL.md # /autonomous — multi-sprint conductor
├── quickdo/SKILL.md # /quickdo — fast single-sprint mode
├── SPRINT.md # Sprint master: per-sprint execution (inlined into prompt)
├── CLAUDE.md # Project instructions for Claude
├── OWNER.md.template # Persona template for manual config
├── skill-config.json # Default template selector (per-project override at .autonomous/)
├── templates/
│ ├── gstack/template.md # Allow/Block sections for gstack toolchain
│ └── default/template.md # Generic fallback, no toolchain assumptions
├── scripts/
│ ├── startup.py # SCRIPT_DIR resolution + project context (shared)
│ ├── parse-args.py # Parse ARGS → _MAX_SPRINTS + _DIRECTION
│ ├── session-init.py # Create session branch, init state + backlog
│ ├── build-sprint-prompt.py # Inline SPRINT.md + params → sprint-prompt.md
│ ├── dispatch.py # Blocking / tmux / headless session dispatch
│ ├── monitor-sprint.py # Poll for sprint-summary.json
│ ├── monitor-worker.py # Poll comms.json + tmux/process liveness
│ ├── evaluate-sprint.py # Read summary JSON, update conductor state
│ ├── merge-sprint.py # Merge or discard sprint branch
│ ├── write-summary.py # Generate sprint-summary.json
│ ├── conductor-state.py # State management (atomic writes, PID lock)
│ ├── explore-scan.py # 8-dimension project scanner
│ ├── backlog.py # Cross-session persistent backlog
│ ├── persona.py # OWNER.md auto-generation
│ ├── loop.py # Standalone launcher (outside CC)
│ ├── master-poll.py # Manual master polling for comms.json
│ └── master-watch.py # Dual-channel monitor (comms + JSONL)
├── tests/
│ ├── test_helpers.sh # Shared test framework
│ ├── test_conductor.sh # 99 tests: state, phase transitions, exploration
│ ├── test_comms.sh # 34 tests: comms.json protocol
│ ├── test_persona.sh # 20 tests: OWNER.md generation
│ ├── test_explore_scan.sh # 45 tests: dimension scoring heuristics
│ ├── test_loop.sh # 20 tests: standalone launcher
│ ├── test_backlog.sh # 76 tests: CRUD, progressive disclosure
│ ├── test_build_sprint_prompt.sh # 25 tests: template resolution, allow/block injection
│ ├── test_eval_output.sh # 35 tests: eval safety, tmux cleanup
│ └── claude # Mock CC binary for testing
├── .claude/skills/ # Internal dev/test skills
│ ├── smoke-test/SKILL.md # E2E pipeline smoke test
│ ├── test-worker/SKILL.md # Spawns worker + auto-answering master
│ ├── capture-worker/SKILL.md # Capture worker JSONL for inspection
│ ├── diff-sessions/SKILL.md # Compare two sessions side-by-side
│ ├── clean-sandbox/SKILL.md # Reset test sandbox
│ └── clean-gstack/SKILL.md # Delete gstack design doc archives
└── README.md
Generated at runtime (gitignored):
OWNER.md— your persona, auto-generated from git + docs.autonomous/conductor-state.json— multi-sprint state machine.autonomous/comms.json— worker↔master IPC.autonomous/sprint-summary.json— per-sprint results
| Guard | How |
|---|---|
| Branch isolation | All work on auto/session-* or auto/quickdo-* branches. Never touches main. |
| Per-sprint branches | Each sprint works on its own branch; merged on success, discarded on failure. |
| Timeout | Each CC invocation capped at 15 min (configurable via CC_TIMEOUT). |
| Cost budget | MAX_COST_USD env var stops the session when exceeded. |
| Excluded workflows | Configured per template (see templates/<name>/template.md ## Block section). |
| Graceful shutdown | SIGINT + sentinel file for clean exit across all layers. |
| 3-strike rule | Same approach fails 3 times → stop and report. |
| Atomic state | Conductor state uses tmp+mv writes, PID lock for concurrency safety. |
329 tests across 7 suites, all pure bash:
bash tests/test_conductor.sh # 99 tests
bash tests/test_comms.sh # 34 tests
bash tests/test_persona.sh # 20 tests
bash tests/test_explore_scan.sh # 45 tests
bash tests/test_loop.sh # 20 tests
bash tests/test_backlog.sh # 76 tests
bash tests/test_eval_output.sh # 35 tests
python3 -m compileall scripts # quick syntax check for Python helpersThe test harness uses tests/claude (a mock CC binary) controlled by env vars:
| Variable | Effect |
|---|---|
MOCK_CLAUDE_COST |
Reported cost per invocation |
MOCK_CLAUDE_COMMIT=1 |
Make a git commit during the mock run |
MOCK_CLAUDE_DELAY |
Sleep N seconds (for timeout tests) |
MOCK_CLAUDE_EXIT |
Exit code to return |
# See what the agent did
git log main..auto/session-TIMESTAMP --oneline
# Detailed diff
git diff main..auto/session-TIMESTAMP --stat
# Merge if satisfied
git checkout main && git merge auto/session-TIMESTAMP
# Or cherry-pick specific commits
git cherry-pick COMMIT_HASHMIT