Skip to content

Sma1lboy/autonomous

Repository files navigation

logo

autonomous

"You sleep. It ships."

License: MIT Claude Code AgentSkills Python


You close your laptop at midnight. 47 TODOs in your backlog.
You open it at 8am. 38 of them are done, tested, committed, on a clean branch.
Total cost: $4.20. No meetings required.

That's autonomous.

A self-driving project agent for Claude Code. Drop it into any git repo, run /autonomous, go to sleep.

Quickstart · Skills · Architecture · How It Works · Configuration · Safety · Testing


Install — 10 seconds

Requirements: Claude Code, Git, Python 3.9+

Optional: tmux (visible worker windows), jq (persona generation)

Paste this into Claude Code:

Install autonomous: git clone https://github.com/Sma1lboy/autonomous.git ~/.claude/skills/autonomous-skill && cd ~/.claude/skills/autonomous-skill && ./setup

That's it. Open any git repo and run /autonomous or /quickdo.


Skills

This package ships two public skills:

/autonomous — Full multi-sprint orchestration

The complete pipeline: Conductor → Sprint Master → Worker. Runs multiple sprints, transitions between directed work and autonomous exploration, manages sprint branches, evaluates results between sprints.

# Default: 10 sprints
/autonomous

# Quick: 3 sprints
/autonomous 3

# With direction: focus on a specific area
/autonomous 5 build REST API

# Direction only (default 10 sprints)
/autonomous fix all auth bugs

/quickdo — Fast single-sprint execution

Lightweight mode. Skips the conductor, runs one sprint master directly via blocking claude -p. No tmux, no multi-sprint state, no monitor polling. One direction, one sprint, done.

# Single task
/quickdo add login page with GitHub OAuth

# Quick fix
/quickdo fix the broken unit tests

Best for tasks that fit in a single sprint — a full page, a complete feature stage, a test suite, a refactor.

Standalone (outside Claude Code)

# Direct CLI invocation via loop.py
AUTONOMOUS_DIRECTION="fix auth bugs" python3 scripts/loop.py /path/to/project

Architecture

Three-layer hierarchy with full context isolation between layers:

Conductor (autonomous/SKILL.md — runs in user's Claude Code session)
  │
  ├── Plans sprint directions (directed phase or exploration phase)
  ├── Dispatches sprint masters via claude -p
  ├── Evaluates sprint results, manages phase transitions
  │
  └── Sprint Master (SPRINT.md — separate claude -p session)
        │
        ├── Sense → Direct → Respond → Summarize loop
        ├── Dispatches workers via claude -p
        ├── Answers worker questions via comms.json protocol
        │
        └── Worker (full Claude session with all tools)
              │
              └── Executes the actual work: reads code, edits files,
                  runs tests, commits changes

Each layer runs in its own Claude session — fresh context per sprint, no bleed between layers.

/quickdo flattens this to two layers: it skips the conductor and runs the sprint master directly.

Backlog — A persistent work queue (.autonomous/backlog.json) that survives across sessions. Workers log out-of-scope discoveries, the conductor decomposes large missions into deferred items. When exploration runs dry, idle sprints pick from the backlog. Progressive disclosure: sprint masters only see one-line titles, the conductor sees full descriptions.

Templates

Worker-task suggestions and boundary blacklists are driven by swappable templates at templates/<name>/template.md. Ships with two: gstack (default — uses /office-hours, /qa, /investigate, blocks /ship etc.) and default (generic, no toolchain commands).

Select a template per project by writing .autonomous/skill-config.json in your project root:

{ "template": "default" }

The project-level override beats the skill-root default at ~/.claude/skills/autonomous-skill/skill-config.json. Unknown template names fall through to default. To add a new template, create templates/<name>/template.md with ## Allow and ## Block sections and point the config at it.


How It Works

  1. Personapersona.py reads your git history + project docs to understand your coding style. Writes OWNER.md.
  2. Discovery — The conductor talks to you to understand the mission. If you passed a direction in args, it confirms and moves on.
  3. Session — Creates an auto/session-TIMESTAMP branch and initializes conductor-state.json.
  4. Conductor loop — Plan → Dispatch → Monitor → Evaluate → Repeat:
    • Directed phase: breaks your mission into sprint-sized tasks, dispatches one sprint master per task
    • Phase transition: when direction is complete (2 consecutive signals + commits, max sprints reached, or 2 zero-commit sprints)
    • Exploration phase: scans the project across 8 dimensions, picks the weakest, generates improvement sprints
  5. Sprint execution — Each sprint master gets a fresh claude -p session, dispatches a worker, answers questions via comms.json, and writes sprint-summary.json when done.
  6. Merge/discard — Successful sprints merge back to the session branch. Failed sprints are discarded.
  7. Backlog pickup — When exploration dimensions are all solid, the conductor checks the backlog for deferred work items before stopping.
  8. Session ends when all sprints are used up, the project feels solid, and the backlog is empty.

Exploration Dimensions

When the directed mission is complete, the conductor autonomously explores 8 dimensions:

Dimension What it audits
test_coverage Untested code paths, missing edge cases
error_handling Missing error messages, unhandled failures
security Hardcoded secrets, injection vulnerabilities, input validation
code_quality Dead code, duplication, overly complex functions
documentation README accuracy, missing docstrings, stale docs
architecture Module boundaries, dependency directions, separation of concerns
performance N+1 queries, blocking I/O, missing caching
dx CLI help text, error messages, setup instructions

Dimensions are scored via fast Python heuristics (explore-scan.py), and the weakest is selected for each exploration sprint.

Comms Protocol

Workers can't use AskUserQuestion in subagent context. Instead, they write questions to .autonomous/comms.json:

{"status": "waiting", "questions": [{"question": "...", "options": [...]}], "rec": "A"}

The sprint master polls, decides using product intuition (or OWNER.md guidance), and writes back:

{"status": "answered", "answers": ["A"]}

Valid statuses: idle, waiting, answered, done.

Worker safety hook (opt-in)

Set AUTONOMOUS_WORKER_CAREFUL=1 to install a PreToolUse hook on every dispatched worker that blocks catastrophic Bash commands:

AUTONOMOUS_WORKER_CAREFUL=1 /autonomous 5 build REST API

Blocks: rm -rf /, rm -rf $HOME, rm -rf /Users|/home, mkfs, dd of=/dev/sd*, fork bombs, device redirects (>, >>, >|, tee, cp to /dev/*), shutdown/reboot, git push --force (all variants), DROP TABLE/DATABASE/SCHEMA, TRUNCATE TABLE. Also catches interpreter wrappers like python3 -c 'os.system("rm -rf /")' and chaining bypasses like echo ok; rm -rf /.

Configured per-sprint via claude --settings <file> — no global settings change. Blocks are exit-2 with a stderr message; the worker reads "BLOCKED: ..." and adapts.

Checkpoints

Take a human-readable snapshot of the current session anytime:

python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py save .
python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py save . --title "pre-refactor"
python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py list .
python3 ~/.claude/skills/autonomous-skill/scripts/checkpoint.py latest .

Each checkpoint is a markdown file at .autonomous/checkpoints/<ts>-<slug>.md capturing mission, phase, sprint history, backlog summary, exploration dimension scores, git state, and resume guidance. History is retained — old checkpoints stay until manually deleted.

Useful for context switching ("where was I yesterday?"), sharing session state with a teammate, or reviewing sprint output before resuming.

Sprint worktrees (opt-in)

Set AUTONOMOUS_SPRINT_WORKTREES=1 to run each sprint in its own git worktree under .worktrees/sprint-N/:

AUTONOMOUS_SPRINT_WORKTREES=1 /autonomous 5 build REST API
  • Main tree stays on the session branch the whole time — no git checkout -b churn
  • Each sprint works on its own branch in its own directory, fully file-isolated
  • .autonomous/ is symlinked from each worktree back to the main tree, so comms, state, summaries, and backlog all go through one source of truth
  • .worktrees/ is auto-added to .gitignore on first sprint
  • On sprint completion: merge runs first (with --keep-branch), then the worktree is removed, then the branch is deleted. If merge conflicts, worktree and branch are preserved for forensic inspection.
  • .worktrees/ or .autonomous/ pre-existing as symlinks are refused (repo-escape prevention).

V1 is serial-only — one sprint at a time. Parallel sprint dispatch is deferred to a future PR.

Configuration

Variable Default Description
MAX_SPRINTS (via args) 10 Max conductor sprints
MAX_ITERATIONS 50 Max iterations for loop.py standalone mode
CC_TIMEOUT 900 Timeout per CC invocation (seconds)
AUTONOMOUS_DIRECTION (none) Session focus (e.g., "fix auth bugs")
MAX_COST_USD (none) Stop when total cost exceeds this
DISPATCH_MODE (auto) blocking (no tmux), headless (background), or auto (tmux if available)

Project Structure

autonomous-skill/
├── autonomous/SKILL.md               # /autonomous — multi-sprint conductor
├── quickdo/SKILL.md                  # /quickdo — fast single-sprint mode
├── SPRINT.md                         # Sprint master: per-sprint execution (inlined into prompt)
├── CLAUDE.md                         # Project instructions for Claude
├── OWNER.md.template                 # Persona template for manual config
├── skill-config.json                 # Default template selector (per-project override at .autonomous/)
├── templates/
│   ├── gstack/template.md            # Allow/Block sections for gstack toolchain
│   └── default/template.md           # Generic fallback, no toolchain assumptions
├── scripts/
│   ├── startup.py                    # SCRIPT_DIR resolution + project context (shared)
│   ├── parse-args.py                 # Parse ARGS → _MAX_SPRINTS + _DIRECTION
│   ├── session-init.py               # Create session branch, init state + backlog
│   ├── build-sprint-prompt.py        # Inline SPRINT.md + params → sprint-prompt.md
│   ├── dispatch.py                   # Blocking / tmux / headless session dispatch
│   ├── monitor-sprint.py             # Poll for sprint-summary.json
│   ├── monitor-worker.py             # Poll comms.json + tmux/process liveness
│   ├── evaluate-sprint.py            # Read summary JSON, update conductor state
│   ├── merge-sprint.py               # Merge or discard sprint branch
│   ├── write-summary.py              # Generate sprint-summary.json
│   ├── conductor-state.py            # State management (atomic writes, PID lock)
│   ├── explore-scan.py               # 8-dimension project scanner
│   ├── backlog.py                    # Cross-session persistent backlog
│   ├── persona.py                    # OWNER.md auto-generation
│   ├── loop.py                       # Standalone launcher (outside CC)
│   ├── master-poll.py                # Manual master polling for comms.json
│   └── master-watch.py               # Dual-channel monitor (comms + JSONL)
├── tests/
│   ├── test_helpers.sh               # Shared test framework
│   ├── test_conductor.sh             # 99 tests: state, phase transitions, exploration
│   ├── test_comms.sh                 # 34 tests: comms.json protocol
│   ├── test_persona.sh               # 20 tests: OWNER.md generation
│   ├── test_explore_scan.sh          # 45 tests: dimension scoring heuristics
│   ├── test_loop.sh                  # 20 tests: standalone launcher
│   ├── test_backlog.sh               # 76 tests: CRUD, progressive disclosure
│   ├── test_build_sprint_prompt.sh   # 25 tests: template resolution, allow/block injection
│   ├── test_eval_output.sh           # 35 tests: eval safety, tmux cleanup
│   └── claude                        # Mock CC binary for testing
├── .claude/skills/                   # Internal dev/test skills
│   ├── smoke-test/SKILL.md           # E2E pipeline smoke test
│   ├── test-worker/SKILL.md          # Spawns worker + auto-answering master
│   ├── capture-worker/SKILL.md       # Capture worker JSONL for inspection
│   ├── diff-sessions/SKILL.md        # Compare two sessions side-by-side
│   ├── clean-sandbox/SKILL.md        # Reset test sandbox
│   └── clean-gstack/SKILL.md         # Delete gstack design doc archives
└── README.md

Generated at runtime (gitignored):

  • OWNER.md — your persona, auto-generated from git + docs
  • .autonomous/conductor-state.json — multi-sprint state machine
  • .autonomous/comms.json — worker↔master IPC
  • .autonomous/sprint-summary.json — per-sprint results

Safety

Guard How
Branch isolation All work on auto/session-* or auto/quickdo-* branches. Never touches main.
Per-sprint branches Each sprint works on its own branch; merged on success, discarded on failure.
Timeout Each CC invocation capped at 15 min (configurable via CC_TIMEOUT).
Cost budget MAX_COST_USD env var stops the session when exceeded.
Excluded workflows Configured per template (see templates/<name>/template.md ## Block section).
Graceful shutdown SIGINT + sentinel file for clean exit across all layers.
3-strike rule Same approach fails 3 times → stop and report.
Atomic state Conductor state uses tmp+mv writes, PID lock for concurrency safety.

Testing

329 tests across 7 suites, all pure bash:

bash tests/test_conductor.sh    # 99 tests
bash tests/test_comms.sh        # 34 tests
bash tests/test_persona.sh      # 20 tests
bash tests/test_explore_scan.sh # 45 tests
bash tests/test_loop.sh         # 20 tests
bash tests/test_backlog.sh      # 76 tests
bash tests/test_eval_output.sh  # 35 tests
python3 -m compileall scripts   # quick syntax check for Python helpers

The test harness uses tests/claude (a mock CC binary) controlled by env vars:

Variable Effect
MOCK_CLAUDE_COST Reported cost per invocation
MOCK_CLAUDE_COMMIT=1 Make a git commit during the mock run
MOCK_CLAUDE_DELAY Sleep N seconds (for timeout tests)
MOCK_CLAUDE_EXIT Exit code to return

Reviewing & Merging

# See what the agent did
git log main..auto/session-TIMESTAMP --oneline

# Detailed diff
git diff main..auto/session-TIMESTAMP --stat

# Merge if satisfied
git checkout main && git merge auto/session-TIMESTAMP

# Or cherry-pick specific commits
git cherry-pick COMMIT_HASH

License

MIT

About

Self-driving project agent for Claude Code. Drop into any git repo, invoke /autonomous-skill, and it loops — finding tasks, fixing code, committing results.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors