A sandboxed coding agent for Linux. The LLM is treated as adversarial: every
command it spawns runs inside a custom Rust launcher (agent6-jail) built on
user namespaces, Landlock, seccomp, pivot_root, capset(0), and
NO_NEW_PRIVS, so a misbehaving model cannot escape the workspace, reach the
network beyond the provider endpoint, or corrupt git history.
Features:
- Sandboxed execution for every LLM-chosen child process (verify commands, metric commands, optional shell)
- Works with Anthropic and any OpenAI-compatible endpoint (OpenAI, OpenRouter, Ollama, vLLM, llama.cpp, LM Studio), tuned to stay effective on cheap open-weights models
- Per-step git commits, snapshot-resumable runs, USD and token budgets with hard stops
- Plan, run, review, and ask modes; a live terminal dashboard; persistent transcripts and a searchable run history
- State machines (
agent6 machine) for long-running automated tasks: LLM-drafted, operator-reviewed, journaled, and replayable - Small, fixed LLM tool surface; the only extension point is operator-configured MCP servers, off by default
- Eight runtime dependencies, no telemetry, no auto-update
- Linux. The sandbox uses Linux-only kernel APIs; macOS and Windows are not supported.
- Kernel 6.7 or newer for Landlock TCP rules. Older kernels fall back to filesystem-only Landlock with a warning.
kernel.unprivileged_userns_clone = 1for thestrictprofile (default on Ubuntu, Debian, and most cloud images); without it agent6 falls back tohardened. On Ubuntu 24.04+ withkernel.apparmor_restrict_unprivileged_userns = 1, install the bundled AppArmor profile (packaging/apparmor/agent6-jail;agent6 check sandboxprints the commands) or set that sysctl to 0.- Python 3.12 or newer, plus an API key for at least one provider.
- Building from source needs a Rust toolchain on
PATH; PyPI wheels bundle a prebuiltagent6-jail.
uv tool install agent6
pipx install agent6Both drop the agent6 entry point in ~/.local/bin; if that is not on your
PATH, run uv tool update-shell or pipx ensurepath and restart your
shell.
From source:
git clone https://github.com/elesiuta/agent6
cd agent6
uv sync
uv run agent6 --helpAGENT6_JAIL_BIN=/path/to/agent6-jail overrides the bundled jail binary.
Via argcomplete:
# Bash / Zsh
eval "$(register-python-argcomplete agent6)"
# Fish
register-python-argcomplete --shell fish agent6 > ~/.config/fish/completions/agent6.fish# Connect a provider once (stored in ~/.config/agent6/, key in a 0600
# secrets file). Works across every repo.
agent6 connect # interactive: pick provider, paste API key
agent6 model worker anthropic claude-sonnet-4-5
# In a project: scaffold .agent6/config.toml + AGENTS.md.
agent6 init
# Audit the effective config: every value and where it came from.
agent6 config show
# Pre-flight: sandbox + config + provider keys + verify_command.
agent6 check
# Run the agent on a task.
agent6 run "add a --json output mode to the CLI"
# Resume an interrupted run from its last tool-call snapshot.
agent6 resume <run-id>
# Read-only code review of a diff. Never touches the worktree.
agent6 review --base origin/main --head HEADConfig is layered: built-in secure defaults, then the global
~/.config/agent6/config.toml, then the per-repo .agent6/config.toml,
then an explicit --config FILE. A repo can be zero-config when the global
config supplies a provider and model; the one thing a repo always needs is
its verify_command.
Other commands:
agent6 watch [<run-id>]: attach the live TUI to an existing run.agent6 plan "<task>": read-only planning pass; execute withagent6 run --from-plan.agent6 ask "<question>": read-only Q&A over the repo, including questions about agent6 itself (it consults its bundled docs). Seed context with@pathor--file;--run <id>asks about a prior run.agent6 memory: persistent agent memory under.agent6/memories/.agent6 history search <query>: search persisted transcripts.agent6 history graph [<run-id>]: render the persisted task graph.agent6 diff [<run-id>]: print the git diff a run produced.agent6 machine ...: author and run state machines (.asm.toml); see STATE_MACHINES.md.agent6 mcp serve: expose agent6's tools over MCP (stdio).agent6 config fill: materialize every effective value into one file.agent6 config get/set/unset/add/remove <key> [value]: edit a single dotted leaf. Writes go to the global config by default,--repofor the repo,--machine FILEfor a machine overlay. Every edit is re-validated and rolled back if invalid.
Every field has a default, and security-sensitive fields default to the safe value. The full reference is CONFIG.md; sandbox profiles are explained in SECURITY.md.
[sandbox]
profile = "auto" # auto | strict | hardened
agent_network = "providers" # providers | local | open (agent's LLM egress)
tool_network = "block" # block | only_explicit_states | allow (jailed commands)
run_commands = "ask" # yes | no | ask
protect_git = true
protect_agent6 = true
[git]
require_clean_worktree = true
branch_per_run = true
commit_strategy = "per_step" # per_step | squash | stage | none
allow_push = false
[workflow]
verify_command = ["uv", "run", "pytest", "-x"]
[budget]
max_input_tokens = 2000000
max_output_tokens = 200000
# best_effort_usd_limit = 10.0 # optional; see CONFIG.md
[providers.anthropic]
kind = "anthropic"
api_key_env = "ANTHROPIC_API_KEY"
[models.worker]
provider = "anthropic"
model = "claude-sonnet-4-5"Budget ceilings can be overridden per run: agent6 run --max-usd 5 "...",
or --max-input-tokens / --max-output-tokens on run, plan, and
resume.
Declare any number of [providers.<name>] blocks, each with
kind = "anthropic" or kind = "openai", its own base_url, and
api_key_env. Per-provider http_timeout_s (default 600) caps each HTTP
call.
agent6 uses three model roles:
| Role | Routed by | Used by |
|---|---|---|
worker |
[models.worker] |
agent6 run / resume; drives USD-to-token conversion. |
reviewer |
[models.reviewer] |
agent6 review and the optional in-loop critic. |
planner |
[models.planner] |
agent6 plan. Falls back to worker when unset. |
agent6 model all <provider> <model> sets every role at once. Each role
takes an optional thinking level (off/low/medium/high).
The tools given to the LLM are fixed and declared in one place, src/agent6/tools/schema.py; adding one requires a security review note in the commit message.
- Read-only:
read_file,list_dir,grep,outline,find_definition,find_references - Edits:
apply_edit(structured blocks),apply_patch(unified diff) - Execution with operator-fixed argv:
run_verify_command,run_metric_command - Control:
finish_run, plusdag_*task-notepad tools backed by the curator subprocess - Conditional:
run_command(argv), only exposed whensandbox.run_commandsallows it, and always jailed
There is no write_file, shell, or web_fetch.
agent6 is a single-loop agent: one provider, one model, one message
history. The model drives the run by calling tools; the workflow dispatches
them, snapshots state before every LLM call (so any run is resumable),
commits when verify_command passes, and hard-stops on budget. Module
boundaries (cli -> workflows -> agents -> tools -> sandbox) are enforced
by tach. See
ARCHITECTURE.md for the run/review loops, the curator
subprocess, and the on-disk layout.
For security details (threat model, per-layer breakdown, sandbox
profiles), see SECURITY.md. Defaults are safe:
agent_network = "providers", tool_network = "block",
run_commands = "ask", protect_* = true, git.allow_* = false, and
git_ops.py refuses push, --force, and history rewrites
unconditionally.
Reproducible harnesses live under bench/:
- bench/realworld/: 11 SWE-bench-Lite-style tasks
scored by fresh sandboxed verifies on hidden tests. Latest recorded run:
agent6 and claude-code both solve 11/11 on the same worker model
(
claude-sonnet-4-5); agent6 at about $2.60 total, claude-code at about $3.96. Single runs, no variance measured; re-run before quoting. - bench/agents/: head-to-head against Claude Code, opencode, and aider on Go and Rust tasks with cheap models.
- bench/machine/:
machine createattempts, cost, and validation results. - bench/perf/: a perf-optimization harness for local experimentation; single-run numbers are too noisy to quote.
Every run ends with a per-model token and cost summary. Model prices are
fetched from the provider's models endpoint and cached (OpenRouter
publishes them; Anthropic does not, so its models report an unknown
price). Where the provider reports per-call cost, that figure is used
directly. The [budget] ceilings hard-stop the run; a stopped run is
resumable.
With stdout a TTY, agent6 run opens a terminal dashboard (task DAG,
budget bar, tool table, live reasoning pane, log tail, latest diff);
--no-tui and -i (stdin REPL) opt out. Approval and Ctrl-C steer
prompts appear as modals, with a /dev/tty fallback when no TUI is
present. agent6 plan, agent6 ask, and agent6 machine create stream
reasoning and answers to the terminal. Attach from another shell with
agent6 watch [<run-id>]; agent6 watch --plain is a plain-text tail.
The dashboard renders the JSONL event stream at
.agent6/runs/<run-id>/logs.jsonl, which is also the contract for
external viewers (vocabulary in ARCHITECTURE.md).
Each run's state lives under .agent6/runs/<run-id>/: append-only task
graph, per-call snapshots that drive agent6 resume, full transcripts,
and the event log. It is written exclusively by a sandboxed
agent6-curator subprocess over a validated IPC channel, so a bug in the
agent cannot scribble the run directory.
If [notify] declares on_complete = [...], agent6 runs that argv after
every agent6 run / resume with AGENT6_RUN_ID, AGENT6_RUN_DIR,
AGENT6_RUN_OK, and AGENT6_RUN_REASON set. The hook runs outside the
jail as your user; the argv is operator-controlled.
Read AGENTS.md first. The repo's verify command decides whether a PR is landable:
uv run ruff check && uv run ruff format --check && \
uv run pyright && uv run tach check && uv run pytestChanges under sandbox/, tools/, git_ops.py, providers/, or
graph/curator must include a security review note in the commit message.