Version: v0.2.4 (MVP design, five review revisions) Date: 2026-05-13 Authors: architecture / platform team Status: design locked, ready for implementation Revision log: see
CHANGELOG.md; paragraphs tagged [v0.2] / [v0.2.1] / [v0.2.2] / [v0.2.3] / [v0.2.4] mark revision items.
Based on the first architecture review, all 10 items adopted. See
CHANGELOG.mdfor the detailed diff.
- Phase 0 scope tightened — §18 rewritten; P0 cut to ~1/3
- Streaming cost header removed — §5.4: streaming uses the SSE
aicg.usageterminating event - Decision object unified as
primary_action + modifiers[] + side_effects[]— §7.3 §8 §16.2 - Budget Reserve/Commit/Release — §9.5 §12.5 §16.8
- Routing selection is replayable (trace_id-derived seed) — §9.2 §12.3
- IR decoupled from the Anthropic SDK — §10 introduces our own
ir.*types - LP→GW switched to a dedicated
POST /v1/agent/forwardendpoint — §5.4 - Dashboard auth collapsed to a single API-key model — §13.1
- Raw defaults to
metadata_only— §11.4 - Endpoint allowlist landed in the main document (upgraded to a
provider_endpointsregistry in v0.2.2) — §9.1
AgentGate is a CCR-like coding-agent gateway aimed at enterprise AI Platform / Engineering Productivity teams. It uniformly governs Claude Code / OpenAI-compatible coding-agent traffic inside the enterprise.
There is no goal of drop-in CCR config compatibility; the client local proxy exposes two wire-level endpoints: Claude Code-compatible (Anthropic Messages API) and OpenAI-compatible (Chat Completions).
Key acronyms
| Acronym | Meaning |
|---|---|
| LP | Local Proxy (client-side daemon) |
| GW | Enterprise Gateway |
| IR | Internal Representation (the gateway's internal canonical form, fixed to the Anthropic Messages API) |
| MP | Model Pool (four classes: cheap / standard / strong / private_strong) |
| PA | Provider Adapter |
| DEK / KEK | Data / Key Encryption Key |
Confirmed bidirectionally with the product owner; treated as hard constraints for implementation.
| ID | Decision |
|---|---|
| A1 | LP integration = HTTP local proxy + environment-variable hijacking |
| A2 | POST /v1/agent/forward; body is {envelope, wire:{protocol, body}}; the LP still exposes Anthropic / OpenAI wire endpoints internally |
| A3 | LP behavior on GW unreachable = fail-closed (admin can configure a fallback to the cheap pool direct connection) |
| B1 | MVP identity = per-user API key + admin YAML for user→team mapping; OIDC device-code reserved for later |
| B2 | Repo identity = repo binding token (primary) + git remote URL/SHA reverse lookup (fallback) |
| B3 | Multi-tenancy = MVP is single-tenant private deployment; the code internally abstracts on tenant_id |
| C1 | Policy DSL = YAML + CEL, priority + deny-overrides |
| C2 | shadow_eval = MVP is a wire-level placeholder + dual emit into the table; no scorecard loop |
| C3 | require_approval = async + pending list in the dashboard |
| D1 | Output streaming scanning = post-hoc full-output scan; alert only, no blocking |
| D2 | Security scanning engine = forked gitleaks ruleset + detect-secrets |
| E1 | [Phased from v0.2] P0/P1: default metadata_only (Postgres metadata row only); P2+: S3-compatible object storage (body) + Postgres (metadata) |
| E2 | [Phased from v0.2] P0/P1: no KMS deployed; P2+: envelope encryption per-record DEK + KMS root key |
| E3 | [Phased from v0.2] P0/P1: TTL only on metadata rows; P2+: per-prefix policies driven by repo |
| F1 | Provider key = gateway-managed vault + per-team BYO co-existing |
| F2 | Cost data = provider usage fields (primary) + tokenizer estimate (fallback for local models / Ollama) |
| H1 | team_id = admin YAML maintains both user→team and repo→team mappings |
| H2 | [Phased from v0.2.1] P0/P1: append-only Postgres, self_hash optional; P3: periodic hash chain + daily root dual-write to S3 |
| Q1 | LP physical layout = single process, dual endpoints (/anthropic/... + /openai/...) |
| Q2 | Internal IR = Anthropic Messages API |
| Q3 | Anthropic-specific features when crossing providers = auto-strip + degraded_features marker; tool_use ↔ OpenAI tools / count_tokens must be real implementations |
| Q4 | [Phased from v0.2.1] P0: plain Postgres tables + plain indexes; P1+: TimescaleDB hypertable + continuous aggregate |
| Q5 | [Phased from v0.2.1] P0: Reserve/Commit/Release interfaces landed; cap is soft-warn only and non-blocking; P1+: hard-cap enforcement, triggers require_approval |
| Q6 | Client-side reporting scope = paths + language + sha256 fingerprints + diff summary stats; does not transmit file contents |
| Q7 | Configuration source = YAML + git ops |
| Q8 | private_strong = Azure OpenAI / Bedrock private-domain endpoints + self-hosted vLLM / TGI all belong to this tier |
| Q9 | Approval notification = generic webhook payload + dashboard pending list; not bound to Slack |
| Q10 | [v0.2 unified] Dashboard auth = single API key + RBAC; first start bootstraps with a one-shot setup_token; no separate password channel |
| Q11 | [v0.2.1 phased] Secret found in streaming response = not recalled; written to audit + alert + post_hoc_violation flag (effective from P3: the MVP does not deploy the output scanner) |
| Q12 | [v0.2.1 phased] P0: LP exposes the Anthropic endpoint only; P1: add the OpenAI-compatible endpoint (streaming + non-streaming) |
| Q13 | trace_id minted by the gateway; session_id minted by the client (hash of agent_pid + start_ts + repo) |
| Q14 | On fallback, each provider call writes its own cost_event sharing the same trace_id |
| Q15 | SSE mid-stream error = break the stream directly to the client; let Claude Code retry on its own |
| Q16 | Local model assumption = uniformly OpenAI-compatible (Ollama / vLLM / LM Studio / TGI / SGLang) |
| Q17 | BYO team key = YAML references a secret reference (env://, file://, vault://) |
| Q18 | [v0.2 phased] Raw prompt RBAC three layers = developer (own) / team_admin (metadata + redacted raw, break-glass) / platform_admin (full + access_audit); takes effect from P2+: raw defaults to metadata_only in P0/P1 with no persistence |
| Q19 | [v0.2 phased] redact = inbound text placeholder substitution [REDACTED-{TYPE}-{HASH8}]; the original text retained in gateway raw storage only takes effect from P2+; P1 lands the redact logic but does not retain the original (no raw store) |
| Q20 | Policy hot reload = inotify + atomic reload; new rules only apply to new requests |
| R1 | Data residency = MVP single region |
| R2 | Idempotency = internal-retry deduplication inside the gateway only; no idempotency-key exposed externally |
| R3 | Policy authoring = platform_admin only in MVP |
| R4 | Client upgrades = no auto-update; brew/npm channels + startup version check |
| R5 | Both streaming and non-streaming are supported |
| R6 | [v0.2.4 CLI examples sync] Onboarding = admin invite link → aicg login --invite <token> → long-lived API key |
| R7 | Repo binding = auto-bind on first use; binding token stored in .git/aicg-binding; admin can disable |
| R8 | Scan latency budget = secret pre-scan p95 must complete < 150 ms; output-scan timeouts only alert |
+-----------------------------------------------------------------------------+
| Developer Workstation |
| |
| Claude Code Cursor / Aider / Codex CLI / custom OAI agent |
| (Anthropic API) (OpenAI Chat Completions) |
| | | |
| | env: ANTHROPIC_BASE_URL = |
| | http://127.0.0.1:7777/anthropic |
| | env: OPENAI_BASE_URL = |
| | http://127.0.0.1:7777/openai |
| v v |
| +----------------------------------------------------------------------+ |
| | AgentGate Local Proxy (single binary daemon) | |
| | +----------+ +----------+ +----------+ +----------+ +-------------+ | |
| | | dual- | | session | | metadata | | secret | | repo | | |
| | | protocol | | / trace | | tagger | | pre-scan | | binding | | |
| | | router | | id mgr | | (heur v0)| | (gitleaks)| | manager | | |
| | +----------+ +----------+ +----------+ +----------+ +-------------+ | |
| | | |
| | +--------------------------+ +---------------------------------+ | |
| | | local config (~/.aicg/) | | gateway client (mTLS, retries) | | |
| | | credentials, gw URL, | | | | |
| | | cached policy snippets | | | | |
| | +--------------------------+ +---------------------------------+ | |
| +----------------------------------------------------------------------+ |
| | |
| | HTTPS (mTLS optional, API key required) |
+----------------------------------|------------------------------------------+
|
v
+-----------------------------------------------------------------------------+
| AgentGate Enterprise Gateway |
| |
| +-------------------------------------------------------------------------+ |
| | Edge Layer (HTTPS, IP allowlist, global rate-limit, trace_id mint) | |
| +-------------------------------------------------------------------------+ |
| | |
| v |
| +-------------------------------------------------------------------------+ |
| | Ingress Pipeline | |
| | (1) auth verify -> (2) repo binding verify -> (3) server reclassify | |
| | -> (4) input safety scan -> (5) policy decision -> (6) redact apply | |
| | -> (7) protocol normalize (OpenAI in -> Anthropic IR) | |
| +-------------------------------------------------------------------------+ |
| | | | |
| v v v |
| +----------+ +-------------------+ +-----------------+ |
| | Server | | Policy Engine | | Budget / Quota | |
| | Reclassi |<------+ | (CEL decider + |-->| Service | |
| | -fier | | | priority/deny) | | (TimescaleDB + | |
| +----------+ | +-------------------+ | Postgres) | |
| | | ^ +-----------------+ |
| | | | inotify reload |
| | | +------+-------------+ |
| | | | Config FS | |
| | | | YAML + CEL | |
| | | | (policies, | |
| | | | model pools, | |
| | | | budgets, | |
| | | | user/team/repo) | |
| | | +--------------------+ |
| | | |
| | +----+--------------+ |
| | | Routing Engine | |
| +------->| (pool select, |--+ |
| | fallback chain, | | |
| | circuit breaker) | | |
| +-------------------+ | |
| v |
| +-----------------------------------------------+ |
| | Provider Adapter Layer | |
| | +----------+ +----------+ +-----------+ | |
| | | Anthropic| | OpenAI / | | OpenAI- | | |
| | | | | Azure / | | compatible| | |
| | | | | OpenRouter| | (Ollama, | | |
| | | | | / LiteLLM | | vLLM, | | |
| | | | | | | TGI) | | |
| | +----------+ +----------+ +-----------+ | |
| +-----------------+-------------------------- --+ |
| | |
| v (streaming SSE) |
| +---------------------------+ |
| | Upstream Model Providers | |
| +---------------------------+ |
| | |
| +----------------------|------------------------------------------------+ |
| | Egress Pipeline v | |
| | (1) stream tap -> (2) post-hoc output scan (async) -> | |
| | (3) usage extractor -> (4) cost calc -> (5) SSE re-emit -> | |
| | (6) raw store write (envelope-encrypted) | |
| +-----------------------------------------------------------------------+ |
| | | | | |
| v v v v |
| +----------+ +-----------+ +-----------+ +-------------+ |
| | Postgres | |TimescaleDB| | S3 / MinIO| | Webhook Out | |
| | audit, | | cost & | | raw body | | alerts, | |
| | metadata,| | routing | | KMS-DEK | | approvals, | |
| | hash | | TS | | per-repo | | violations | |
| | chain | | | | prefix | | | |
| +----------+ +-----------+ +-----------+ +-------------+ |
| |
| +-----------------------------------------------------------------------+ |
| | Dashboard (Next.js, served by GW or separate) | |
| | cost views | audit search | approval queue | policy RO viewer | |
| | | access audit | webhook config | |
| +-----------------------------------------------------------------------+ |
+------------------------------------------------------------------------------+
| Unit | Form | Deployment |
|---|---|---|
aicg-lp |
LP daemon, single binary | Developer machine, brew / npm / scoop |
aicg-gw |
Gateway, single binary (with embedded dashboard static assets) | Enterprise VPC, docker-compose / helm |
| Postgres + TimescaleDB | Single instance | Same VPC |
| S3-compatible object storage | MinIO (self-hosted) or AWS S3 / GCS | Same VPC or controlled object storage |
| KMS | AWS KMS / GCP KMS / Vault Transit / local file KEK | Reuse the existing enterprise stack |
| Module | Responsibility | Key dependencies |
|---|---|---|
edge |
TLS termination, IP allowlist, global rate-limit, trace_id injection | chi router, golang.org/x/time/rate |
auth |
API key verification, user/team resolution, RBAC | bcrypt, Postgres |
repo_binding |
Verify repo binding token, git remote fallback lookup | ed25519 |
metadata |
Envelope parsing + server reclassifier | CEL |
safety |
Input scan (block/redact), output post-hoc scan (alert) | gitleaks rules + detect-secrets-py via gRPC sidecar |
policy |
YAML + CEL decision, priority + deny-overrides, hot reload | cel-go, fsnotify |
routing |
Pool selection, fallback chain, circuit breaker | sony/gobreaker |
provider |
Provider-adapter dispatch, transformer pipeline | Vendor SDKs |
transformer |
Anthropic IR ↔ OpenAI; cross-provider feature degradation | In-house |
budget |
Soft / hard threshold check, consumption writes | TimescaleDB |
cost |
Usage parsing, tokenizer estimate, pricing table | tiktoken-go, anthropic-tokenizer |
audit |
Append-only writes, periodic hash-chain job | Postgres |
raw_store |
Object-storage put/get, envelope encryption, TTL job, access audit | S3 SDK, KMS SDK |
webhook |
Outbound notifications (approvals / alerts / violations) | retry queue |
approval |
Pending list, approve/reject API | Postgres |
dashboard_api |
RESTful + SSE for the dashboard | chi |
dashboard_ui |
Next.js (built and embedded into the binary via embed.FS) |
Next.js export |
| Module | Responsibility |
|---|---|
dual_router |
Path prefixes /anthropic/* and /openai/* |
session |
On startup compute session_id = sha256(agent_pid + start_ts + repo) |
tagger_v0 |
Heuristic inference of task_type / complexity / agentic_loop |
pre_scan |
gitleaks rules + custom regex |
repo_binder |
Auto-bind flow; binding-token caching |
gw_client |
mTLS, retry, timeout, version negotiation |
local_config |
~/.aicg/config.yaml + ~/.aicg/credentials |
Example: "Claude Code initiates a streaming
code_editrequest from a repo". Each step labels the responsible component, the phase tag, and the failure handling. Phase tags:[P0]= effective at MVP;[P1]= from Phase 1;[P2+]= from Phase 2.
Claude Code
| (1) Anthropic Messages API request, streaming=true
v
LP /anthropic/v1/messages [P0]
|--- (2) Load ~/.aicg/credentials -> attach Authorization: Bearer <user-api-key>
|--- (3) Check repo binding: read .git/aicg-binding; if missing, synchronously call GW /v1/repo/bind
|--- (4) session/trace: inject X-AICG-Session-Id (trace_id is minted by GW)
|--- (5) tagger_v0 emits metadata (task_type / language / file_paths / ...)
|--- (6) pre_scan: gitleaks scan over the prompt text [P1]
| Hit -> set contains_secret_like_pattern=true into envelope.scan
| 150ms timeout -> fail-closed, return 5xx to Claude Code
|--- (7) Build request body: { envelope, wire:{protocol:"anthropic_messages", body:<original>} }
| [v0.2] No more X-AICG-Envelope header
v
GW POST /v1/agent/forward [P0]
|--- (8) Edge: TLS termination, per-API-key rate limit, mint trace_id (UUIDv7)
| Emit X-AICG-Trace-Id immediately in response headers
v
GW Ingress
|--- (9) auth: API key hash lookup -> user_id/team_id/role; failure -> 401 [P0]
|--- (10) repo_binding: ed25519 verify + machine_id check; failure -> 403 [P0]
|--- (11) Server reclassifier: recompute task_type/complexity/sensitivity [P0]
| Client hints are only used as a prior weighting
|--- (12) safety.input: gitleaks (P1) + detect-secrets sidecar (P3) [P1]
| Hit + policy decides block -> 451; hit + redact -> placeholder substitution
|--- (13) policy.Decide(...) -> Decision{ [P0/partial P1]
| primary_action, // P0 only `route`; P1 adds block/redact/require_approval
| modifiers, side_effects,
| model_pool, shadow_pool,
| redactions, reasons,
| require_approval_id }
| block -> 451 + audit; require_approval -> 202 + X-AICG-Approval-Id
| shadow_eval -> primary path continues (real dual emit in P2+)
|--- (14) Protocol normalize: convert inbound OpenAI to IR; Anthropic inbound passes through [P1, shipped with LP OpenAI endpoint]
v
GW Budget.Reserve(estimated_cents) [P0 interface / P1 enforce]
|--- (15) estimated = input_tokens × price_in + max_output_tokens × price_out
| P0: only record outstanding, no blocking
| P1: exceeding hard cap -> return ErrHardCapExceeded -> turn into require_approval
v
GW routing.engine
|--- (16) seed = sha256(trace_id : pool : attempt_no); write routing_event.seed [P0]
| ChaCha8(seed) picks a chain by weight; fallback_pool recursion (P1 fallback/breaker)
v
GW provider.adapter
|--- (17) transformer.outbound: IR -> target provider wire [P0]
| Cross-provider: strip cache_control / extended thinking
| tool_use ↔ openai tools real conversion; record degraded_features[]
|--- (18) Initiate upstream streaming SSE; timeout/5xx -> fallback chain (from P1) [P0/P1]
v
Upstream Provider
|
v
GW egress.stream_tap [P0]
|--- (19) Forward + accumulate into a bounded buffer (max_buffer_size)
| Translate back to LP per wire.protocol; inject the custom SSE aicg.usage event before stream end
|--- (20) Post-hoc output scan -> webhook alert (non-blocking) [P3]
v
GW Settle
|--- (21) usage extractor + cost calc -> Budget.Commit(reservation_id, actual) [P0 three tables / P1 TimescaleDB]
| Failure path calls Budget.Release(reservation_id)
| Write cost_event + routing_event
|--- (22) raw_store.put (conditional by storage_policy) [P2+]
| metadata_only -> write raw_record metadata row only
| redacted_only / full -> KMS encrypt to S3
| P0/P1 default metadata_only -> condition branch
|--- (23) audit.append: Postgres append-only; self_hash optional (P0/P1) [P0]
| From P3 enable prev_hash chain + daily root dual-write
v
LP <- SSE stream -> Claude Code (passthrough upstream wire; aicg.usage consumed internally by LP)
Failure / exception branches
| Stage | Failure | Behavior |
|---|---|---|
| 6 | pre_scan timeout (P1) | LP fail-closed, 5xx + user prompt |
| 9-10 | auth/binding | 401/403, LP passes through |
| 12 | Input scan timeout (P1) | fail-closed for secret rules; fail-open for PII |
| 13 | policy block | 451 with policy_reason in body |
| 13 | require_approval | 202 + X-AICG-Approval-Id; LP prompts the user |
| 15 | Budget.Reserve fail (P1) | Per policy: turn into require_approval or return 429 |
| 18 | Upstream 5xx before first-token flush | LP retries internally once (same trace_id, attempt_no++); on continued failure pass 502 to the agent |
| 18 | Upstream 5xx after first-token flush | LP has already returned 200 OK; status cannot be changed; only stream-break + inject event: aicg.error data{partial:true}; the agent retries itself |
| 20 | Post-hoc scan hit (P3) | No recall; webhook alert + audit post_hoc_violation=true |
| 22 | raw_store.put failure | best-effort async DLQ; does not affect the response |
| GW unreachable | LP fail-closed (default) or admin-configured fallback to cheap pool direct connect |
http://127.0.0.1:<port>/anthropic/v1/messages
http://127.0.0.1:<port>/anthropic/v1/messages/count_tokens
http://127.0.0.1:<port>/openai/v1/chat/completions
http://127.0.0.1:<port>/openai/v1/models (passthrough; filtered by allowlist)
http://127.0.0.1:<port>/_aicg/health
http://127.0.0.1:<port>/_aicg/version
http://127.0.0.1:<port>/_aicg/whoami
http://127.0.0.1:<port>/_aicg/sessions (debug)
aicg login --invite <token> # [v0.2.4] exchange an invite for a long-lived API key
aicg logout
aicg start [--port 7777] [--config <path>] [--foreground]
aicg stop
aicg status
aicg bind-repo [--path .] # manual bind (use when auto-bind fails)
aicg config show
aicg config set gateway_url=...
aicg policy show # pull a summary of applicable policy from GW
aicg version
aicg doctor # self-check: gateway reachable, credentials, binding, env vars
aicg env prints a shell snippet that the developer sources or appends to their rc file:
export ANTHROPIC_BASE_URL="http://127.0.0.1:7777/anthropic"
export ANTHROPIC_API_KEY="sk-aicg-noop" # LP swaps in the real credential
export OPENAI_BASE_URL="http://127.0.0.1:7777/openai/v1"
export OPENAI_API_KEY="sk-aicg-noop"The LP drops the agent-supplied Authorization and attaches its own user API key to GW (GW identifies the user from this and then substitutes the appropriate provider key downstream).
The v0.1 design was "LP passes through upstream wire + envelope in headers". The review pointed out that a realistic envelope (64 file paths + fingerprints + diff summary + scan findings) easily exceeds the 8KB header limit, and falling back to body breaks the "transparent wire" principle — that design is half-baked by construction. v0.2 changes to: LP→GW uses a dedicated envelope endpoint; the LP↔agent loopback still exposes the upstream-wire
/anthropic/...//openai/...unchanged.
POST <gateway>/v1/agent/forward
Required request headers
Authorization: Bearer <user-api-key>(injected by the LP)X-AICG-LP-Version: <semver>X-AICG-Session-Id: <session_id>X-AICG-Repo-Binding: <signed-binding-token>Accept: text/event-stream(streaming) orapplication/json(non-streaming)
Request body
{
"envelope": { /* full AICGEnvelope object, see §6 */ },
"wire": {
"protocol": "anthropic_messages" | "openai_chat_completions",
"stream": true,
"body": { /* original upstream provider request body, byte-for-byte */ }
}
}After receiving: GW parses the envelope → runs the ingress pipeline → policy → routing → provider → egress → writes the response back in the upstream wire's SSE / JSON per wire.protocol; the LP forwards transparently to the agent (the agent sees only the upstream protocol; nothing else).
X-AICG-Trace-Id: <uuid>X-AICG-Decision: <primary_action>(allow|block|route|require_approval)X-AICG-Modifiers: <comma-list>(redact|escalate_to_strong_model)X-AICG-Side-Effects: <comma-list>(shadow_eval|log_only)X-AICG-Reasons: <comma-list-of-rule-ids>X-AICG-Routed-To: <provider>:<model>X-AICG-Degraded-Features: <comma-list>X-AICG-Cost-Cents: <int>— non-streaming responses only; streaming uses §5.4.3X-AICG-Approval-Id: <id>— when require_approval
5.4.3 Carrying cost / termination info in a stream (the aicg.* custom SSE events) [v0.2.2 schema formalized]
v0.2 correction: HTTP/1.1 SSE cannot append regular headers once the body has started; HTTP/2 trailers are not read by most client SDKs. Streaming no longer promises
X-AICG-Cost-Cents. v0.2.2 makes the event schema strict and clarifies LP passthrough rules and debug behavior.
Event namespace: all AgentGate metadata events use the aicg. prefix; they cannot collide with upstream wire event names (message_start, content_block_delta, etc.).
Event 1: aicg.usage (always present at stream end; not emitted for non-streaming)
event: aicg.usage
id: <trace_id>
data: <JSON object>
{
"schema_version": "1.0",
"trace_id": "01hxxx...",
"session_id": "abc...",
"cost_cents": 42,
"cost_source": "provider_usage", // "provider_usage" | "estimated"
"tokens": {
"input": 1024,
"output": 512,
"cache_read": 0,
"cache_create": 0
},
"decision": {
"primary_action": "route",
"modifiers": [],
"side_effects": [],
"model_pool": "standard",
"reasons": ["P-ROUTE-002"]
},
"routed_to": "anthropic-prod:claude-opus-4-7", // endpoint_id:model
"attempt_no": 1,
"degraded_features": [],
"latency_ms": 8230,
"partial": false
}Event 2: aicg.error (only emitted on failure after a token has already been flushed; pre-first-token failures use HTTP error codes)
event: aicg.error
id: <trace_id>
data: <JSON object>
{
"schema_version": "1.0",
"trace_id": "01hxxx...",
"code": "upstream_5xx", // upstream_5xx | upstream_timeout | upstream_disconnect | gw_internal
"provider_status": 502, // optional
"message": "anthropic returned 502 mid-stream",
"partial": true,
"tokens_emitted_so_far": 318
}LP passthrough rules
| Event | LP→agent passthrough | LP internal handling |
|---|---|---|
Upstream wire events (message_start / content_block_delta / message_stop / ...) |
✅ verbatim | Content not parsed |
aicg.usage |
❌ not passed through | After parsing, update the local ledger (aicg status / aicg.tracesdb) for dashboard fallback queries |
aicg.error |
❌ not passed through | Translate to the upstream wire's "stream ended early" semantics: (a) for the Anthropic wire, inject event: message_stop + data: {stop_reason: "error"}; (b) for the OpenAI wire, inject data: [DONE] |
Upstream SDKs (Anthropic SDK / OpenAI SDK) only look at upstream event names; behavior on unfamiliar
event: aicg.*lines is SDK-implementation-specific (most ignore, a few panic). The LP must intercept them to keep SDK behavior stable.
Debug / direct-curl mode
The LP accepts a --passthrough-aicg-events flag (dev only) that disables interception so aicg.* events go straight to stdout:
aicg start --foreground --passthrough-aicg-events
# then:
curl -N http://127.0.0.1:7777/anthropic/v1/messages -d @req.json | grep '^event: aicg'Production deployments force this flag to false (the systemd unit / process refuses to start with it set).
Clients without SSE-event parsing can call GET /api/v1/cost/breakdown?trace_id=... to fetch the equivalent of aicg.usage.
Key constraint: once the HTTP response status line has been emitted (200 OK +
Content-Type: text/event-stream), the status cannot be changed. LP retries are only valid before the first content token has been flushed to the agent.
LP's internal state machine (in the direction of the agent):
[no status] --first-byte-from-GW--> [200 OK headers flushed] --any-token--> [streaming]
| | |
| GW 5xx / disconnect | GW 5xx but no real token yet | GW 5xx already flushed token
v v v
LP decides: LP can still: LP can only:
(a) retry internally fallback (once) (a) retry internally fallback (once) - inject event: aicg.error
(b) still failing -> 502 / 503 to agent (b) still failing -> inject - data: {trace_id, partial:true}
event: aicg.error in the open 200 - close SSE stream
stream, then close - cannot change status (200 already)
(c) cannot change status
| Scenario | Behavior |
|---|---|
| Upstream 5xx, LP has not flushed any token | LP internal retry once; still failing → pass 502/503 to the agent; audit partial=false |
| Upstream 5xx, LP has already flushed at least one token | LP must keep the 200 OK; only inject event: aicg.error + data: {trace_id, code, partial:true} and close; the agent sees the SSE ending early and decides on its own retry |
| Non-streaming upstream 5xx | GW walks the fallback chain; final failure returns 502 + JSON body |
| GW unreachable | LP fail-closed (default) or fall back to direct cheap pool (admin configured) |
Implementation note: the LP-side SSE handler holds a "may retry" flag before writing the first byte; it is cleared the moment the first frame is written. All retry decisions complete before the first byte.
# JSON Schema (informal)
AICGEnvelope:
schema_version: "1.0"
client:
lp_version: string # e.g. "0.3.1"
os: enum[darwin,linux,windows]
arch: enum[amd64,arm64]
identity: # client-side hints; the server recomputes / verifies
user_id: string
team_id: string # client-reported, GW corrects via admin YAML
machine_id: string # stable hash of hostname+install
agent:
tool: enum[claude_code, cursor, aider, codex_cli, continue, custom_oai]
tool_version: string?
wire_protocol: enum[anthropic_messages, openai_chat_completions]
session:
session_id: string # sha256(agent_pid+start_ts+repo)
turn_index: int # N-th turn within the session
is_continuation: bool # whether tool_result is included (agentic-loop continuation)
repo:
repo_id: string? # the id from the GW-issued repo binding (trusted)
remote_url: string? # client-reported, verified by GW
branch: string?
head_sha: string?
is_dirty: bool?
context_signals: # heuristic tagger output
file_paths: [string] # relative paths (deduped, capped at 64 entries)
file_fingerprints: # no content sent
- path: string
size_bytes: int
sha256: string
language: string?
diff_summary: # statistics only, no diff text
lines_added: int
lines_removed: int
contains_test_failure_keyword: bool
contains_stack_trace: bool
primary_language: string?
task_hints: # heuristic v0 client labels
task_type: enum[planning, architecture, repo_search, file_reading,
simple_edit, code_edit, test_output, debug, review,
security_review, summary, unknown]
complexity: enum[low, medium, high, unknown]
data_sensitivity: enum[low, medium, high, unknown]
agentic_loop: bool
contains_secret_like_pattern: bool
contains_pii_like_pattern: bool
security_sensitive_area: bool
routing_hints: [string] # free-form hints, log only
confidence: # client confidence per hint [0,1]
task_type: float
complexity: float
data_sensitivity: float
scan:
pre_scan_engine: "gitleaks-fork@<rev>"
pre_scan_findings: # finding type + position only; **no plaintext**
- rule_id: string
severity: enum[info,warn,critical]
offset: int
length: int
custom_tags: # custom extension (capped at 16 KB)
{string -> string|number|bool}Contract notes
- The envelope is a hint in its entirety; the server reclassifier unconditionally recomputes
task_type / complexity / data_sensitivity; client values feed theconfidencetable as a prior. - The envelope does not carry prompt text (avoid duplication + reduce attack surface); prompts travel via the wire body.
repo.remote_urland therepo_idinside therepo_bindingtoken must be validated for consistency by GW.
# policies/main.yaml
version: 1
defaults:
on_no_match: { action: route, model_pool: standard, reasons: ["default-fallthrough"] }
rules:
- id: P-SEC-001
description: "Restricted repos block hard secrets pattern"
priority: 1000 # larger = evaluated first
when: |
envelope.repo.repo_id in restricted_repos
&& (envelope.task_hints.contains_secret_like_pattern
|| server_class.has_secret_finding)
action: block
reasons: ["restricted-repo + secret-detected"]
- id: P-SEC-002
description: "Any provider key / private key -> redact"
priority: 990
when: |
server_class.findings.exists(f, f.type in ["aws_key","gcp_key","private_key","jwt"])
action: redact
- id: P-ROUTE-001
description: "summary / test_output / repo_search -> cheap"
priority: 500
when: |
server_class.task_type in ["summary","test_output","repo_search"]
&& envelope.task_hints.data_sensitivity != "high"
action: route
model_pool: cheap
- id: P-ROUTE-002
description: "code_edit / simple_edit / planning -> standard"
priority: 500
when: |
server_class.task_type in ["code_edit","simple_edit","planning"]
action: route
model_pool: standard
- id: P-ROUTE-003
description: "architecture / debug / security_review -> strong"
priority: 500
when: |
server_class.task_type in ["architecture","debug","security_review"]
action: route
model_pool: strong
- id: P-ROUTE-004
description: "Restricted repo or high sensitivity -> private_strong"
priority: 700
when: |
envelope.repo.repo_id in restricted_repos
|| server_class.data_sensitivity == "high"
action: route_to_private_model
model_pool: private_strong
- id: P-BUDGET-001
description: "Team monthly budget hit hard cap -> escalate to require_approval"
priority: 800
when: |
budget.team_monthly_used_cents >= budget.team_monthly_cap_cents
action: require_approval
- id: P-EVAL-001
description: "Shadow eval: 5% of code_edit also sent to strong for evaluation"
priority: 100
when: |
server_class.task_type == "code_edit" && rand() < 0.05
action: shadow_eval
model_pool: strong
variables:
restricted_repos: ["repo_payments_core", "repo_keys_vault"]Inputs:
envelope (client-reported)
server_class (server-side reclassifier output)
user, team, repo (auth resolved)
budget (Budget.Check snapshot, with outstanding reservations)
rand() (built-in, deterministically derived from trace_id)
Steps:
1. Load all rules; sort by priority descending
2. Evaluate `when` expressions (CEL) in order; collect all matched rules
3. Apply the three-slot merge algorithm (see §8.2):
- PrimaryAction: single-pick, deny-overrides; block > require_approval > route* > allow
- Modifiers: 0..N cumulative (redact, escalate_to_strong_model)
- SideEffects: 0..N cumulative (shadow_eval, log_only)
4. Emit Decision{
primary_action,
modifiers[],
side_effects[],
model_pool,
shadow_pool,
redactions[],
reasons:[rule_ids],
degraded_features[] // populated during routing
}
The review pointed out that v0.1's §7.3 had a single
Action, §8 supported redact + route + shadow_eval stacking, and §16.2 introducedModifiers— three inconsistent shapes. v0.2 collapses them into a three-slot model.
type Decision struct {
PrimaryAction string // single terminal slot: allow|block|route|route_to_private_model|require_approval
Modifiers []string // stackable modifiers: redact, escalate_to_strong_model
SideEffects []string // side effects: shadow_eval, log_only
ModelPool string // primary routing: cheap|standard|strong|private_strong
ShadowPool string // filled when SideEffects include shadow_eval
Redactions []RedactionSpec // filled when Modifiers include redact
Reasons []string // matched rule_ids
// [v0.2.3] Endpoint constraints — the policy does not iterate candidate members;
// it emits constraints; the RoutingEngine filters the registry during BuildChain
RequiredTrustTier string // "" | "vendor" | "partner" | "private" (minimum allowed tier)
RequiredDataResidency []string // allowed data_residency list, e.g. ["eu"], ["us","on_prem"]; empty = unrestricted
RequiredCapabilities []string // required supports.* keys, e.g. ["cache_control","tools"]; empty = no requirement
DegradedFeatures []string // populated during routing
RequireApprovalID string // filled when PrimaryAction == require_approval
}Policy / Routing responsibility split [v0.2.3]
- Policy (runs before routing) only sees envelope / server_class / user / team / repo / budget; it does not know about candidate pool members.
- The reason policy can read
endpoints["..."].*in CEL is to emit constraints (e.g. deriverequired_data_residency=["eu"]based onrepo.tags), not to pick endpoints. - The RoutingEngine filters the
provider_endpointsregistry insideBuildChainby the constraints: keep endpoints withtrust_tier ≥ required∧data_residency ∈ required_residency∧supports[k] == true ∀ k ∈ required_capabilities. - Empty member set after filtering → routing failure: return 502 + audit
routing_no_candidate_after_constraints.
Slot semantics
| Slot | Cardinality | Terminating | Examples |
|---|---|---|---|
PrimaryAction |
single-pick | Decides whether the request enters routing | route, block, require_approval |
Modifiers |
0..N | Mutates request content or routing preference; does not flip allow/deny | redact (replace prompt text), escalate_to_strong_model (bump pool up) |
SideEffects |
0..N | No effect on the primary path; produces additional events | shadow_eval (dual emit to ShadowPool), log_only (write audit + webhook, non-blocking) |
Example
{
primary_action: "route_to_private_model",
modifiers: ["redact"],
side_effects: ["shadow_eval"],
model_pool: "private_strong",
shadow_pool: "strong",
redactions: [{type:"aws_key", offset:1024, length:40}],
reasons: ["P-SEC-REDACT-001", "P-SENS-UP-003", "P-EVAL-001"]
}
Core algorithm: priority desc + deny-overrides, resolved against the §7.3 three-slot model.
v0.2 revision: terminology fully aligned with §7.3 / §16.2. There is no generic "action" slot; the
action:field in the rule file is mapped to one of the three slots at parse time.
In YAML you write:
- id: P-...
action: <verb> # single-pick
modifiers: [<verb>, ...] # optional
side_effects: [<verb>, ...] # optional<verb> category table (action: for a single rule must be from the PrimaryAction column; modifiers: / side_effects: each from its own column):
| Verb | Slot | Meaning |
|---|---|---|
allow |
PrimaryAction | Equivalent to route(default_pool); pass through |
block |
PrimaryAction | terminal-deny; 451 + audit |
route |
PrimaryAction | Enter routing; pool from model_pool |
route_to_private_model |
PrimaryAction | route specialization; pool forced to private_strong |
require_approval |
PrimaryAction | terminal-pending; write to the approval queue |
redact |
Modifier | Inbound text placeholder substitution; continue to route |
escalate_to_strong_model |
Modifier | Bump the picked pool to strong (no-op if already strong/private_strong) |
shadow_eval |
SideEffect | Duplicate the request to the ShadowPool referenced by model_pool; not returned to the client |
log_only |
SideEffect | No effect on the decision; only writes audit + fires webhook |
hits = filter(rules, r => CEL(r.when) == true) sorted by priority desc
PrimaryAction:
blocks = [r for r in hits if r.action == "block"]
if blocks: result.PrimaryAction = "block"; result.Reasons = [highest_priority_block.id]; return
approvals = [r for r in hits if r.action == "require_approval"]
if approvals: result.PrimaryAction = "require_approval"; result.RequireApprovalID = mint(); ...
routes = [r for r in hits if r.action in ("route","route_to_private_model","allow")]
if routes: take highest priority -> result.PrimaryAction & ModelPool
else: apply defaults.on_no_match
Modifiers (non-terminating; everything accumulates):
for r in hits:
for m in r.modifiers: result.Modifiers.add(m)
if r.action == "redact": result.Modifiers.add("redact") # backward-compat shorthand
if r.action == "escalate_to_strong_model": result.Modifiers.add(...)
SideEffects (everything accumulates):
same as above; shadow_eval / log_only all added
result.Reasons = unique(rule_ids of all contributing hits)
Example 1: block reason vetoes
Matched: R1 block prio=1000 / R2 redact prio=990 / R3 route_to_private_model prio=700
Result: PrimaryAction=block, Reasons=[R1]
Example 2: modifier + side_effect stacking
Matched: R2 redact prio=990 / R3 route_to_private_model prio=700 / R5 shadow_eval[strong] prio=100
Result: {
PrimaryAction: "route_to_private_model",
Modifiers: ["redact"],
SideEffects: ["shadow_eval"],
ModelPool: "private_strong",
ShadowPool: "strong",
Reasons: [R2, R3, R5]
}
Example 3: fallthrough
Matched: 0 rules
Result: defaults.on_no_match → route(standard)
Conflict worked example
Matched:
R1 block prio=1000
R2 redact prio=990
R3 route_to_private_model prio=700
R4 route[standard] prio=500
R5 shadow_eval[strong] prio=100
Decision:
block matched -> terminal -> action=block, reasons=[R1]
(R2-R5 not processed further)
Matched:
R2 redact prio=990
R3 route_to_private_model prio=700
R5 shadow_eval[strong] prio=100
Decision:
redact is a modifier -> stacked
terminal-allow picks R3 -> route_to_private_model (private_strong)
shadow_eval is a side-effect -> stacked
Final: action=redact+route_to_private_model+shadow_eval, pool=private_strong
Exception: no rule matches → defaults.on_no_match.
Review v0.2.1 #4: v0.2's allowlist stored full URLs, but pool member fields are logical names (e.g.
endpoint: "ollama-cluster") — the two cannot match directly. v0.2.1 introduces aprovider_endpointsregistry as the single source of truth; pools only referenceendpoint_id; the allowlist is the set ofendpoint_idvalues.
# pools.yaml
# (1) Endpoint registry [v0.2.2 extended attributes]
# - The only allowed list of physical endpoints
# - Carries compliance / trust / capability attributes so policies can reference them directly
# (avoiding scattered `variables` lists)
provider_endpoints:
anthropic-prod:
provider: anthropic
url: "https://api.anthropic.com"
data_residency: us
trust_tier: vendor # vendor | partner | private
supports:
streaming: true
tools: true
cache_control: true
extended_thinking: true
openai-prod:
provider: openai
url: "https://api.openai.com/v1"
data_residency: us
trust_tier: vendor
supports:
streaming: true
tools: true
cache_control: false
extended_thinking: false
openrouter-prod:
provider: openrouter
url: "https://openrouter.ai/api/v1"
data_residency: us
trust_tier: vendor
supports: { streaming: true, tools: true, cache_control: false, extended_thinking: false }
azure-tenant-1:
provider: azure_openai
url: "https://acme-tenant.openai.azure.com"
deployments: ["ent-gpt4o", "ent-gpt4o-eu"]
data_residency: eu
trust_tier: private # enterprise-owned tenant
supports: { streaming: true, tools: true, cache_control: false, extended_thinking: false }
bedrock-us-east-1:
provider: anthropic_bedrock
url: "bedrock-runtime.us-east-1.amazonaws.com"
data_residency: us
trust_tier: private
supports: { streaming: true, tools: true, cache_control: true, extended_thinking: true }
ollama-cluster:
provider: openai_compat
url: "https://ollama-cluster.internal:11434/v1"
data_residency: on_prem
trust_tier: private
supports: { streaming: true, tools: false, cache_control: false, extended_thinking: false }
vllm-prod:
provider: openai_compat
url: "https://vllm-prod.internal:8000/v1"
data_residency: on_prem
trust_tier: private
supports: { streaming: true, tools: true, cache_control: false, extended_thinking: false }
litellm-internal:
provider: litellm
url: "https://litellm-proxy.internal:4000"
data_residency: us
trust_tier: partner
supports: { streaming: true, tools: true, cache_control: false, extended_thinking: false }
# (2) GW startup / reload checks:
# - Every pool member's endpoint_id must be in provider_endpoints
# - Registry changes require codeowner dual approval (CODEOWNERS lists configs/pools.yaml)
# - Any URL not in the registry appearing in routing_event.member_selected fires an immediate alert
#
# (3) Policy may reference directly [v0.2.3 correction: constraint output, not iteration]:
# - Policy runs before routing and **does not** receive candidate members.
# - In CEL, reading endpoints[<id>] is used to generate constraint fields on Decision:
# required_trust_tier, required_data_residency, required_capabilities
# - The RoutingEngine filters the registry by these constraints during BuildChain to pick compliant candidates.
# Example policy CEL:
# when: "eu-data-residency" in repo.tags
# emit: required_data_residency = ["eu", "on_prem"]
# Future EU-only / private-only / cache_control-required rules are all expressed via required_* constraints,
# with no scattered variables and no endpoint picking in policy.
pools:
cheap:
members:
- { endpoint_id: openrouter-prod, model: "deepseek/deepseek-chat", weight: 70 }
- { endpoint_id: ollama-cluster, model: "qwen2.5-coder:7b", weight: 30 }
fallback_pool: standard
max_attempts: 2
timeout_ms: 60000
standard:
members:
- { endpoint_id: anthropic-prod, model: "claude-sonnet-4-6", weight: 60 }
- { endpoint_id: openai-prod, model: "gpt-4o", weight: 40 }
fallback_pool: strong
max_attempts: 3
timeout_ms: 90000
strong:
members:
- { endpoint_id: anthropic-prod, model: "claude-opus-4-7", weight: 80 }
- { endpoint_id: openai-prod, model: "o3", weight: 20 }
fallback_pool: null
max_attempts: 2
timeout_ms: 180000
private_strong:
members:
- { endpoint_id: azure-tenant-1, deployment: "ent-gpt4o", model: "gpt-4o", weight: 60 }
- { endpoint_id: bedrock-us-east-1, model: "anthropic.claude-opus-4-7", weight: 40 }
fallback_pool: null
max_attempts: 1 # private domain does not tolerate cross-domain fallback
timeout_ms: 240000The
privateflag comes from theprovider_endpointsregistry; pool members do not redeclare it.
Review v0.2 #5: weighted random must be replayable (trace_id-derived seed). Review v0.2.3 #1:
BuildChainconsumes the constraints emitted byDecisionto filter the registry.
SelectChain(pool, decision, trace_id, attempt_no):
seed = sha256(trace_id || ":" || pool || ":" || attempt_no)[:16] # [v0.2]
rng = ChaCha8(seed) # [v0.2]
1. Read pool.members; resolve each member's endpoint attributes from the registry
2. Filter by constraints [v0.2.3]:
member.endpoint.trust_tier >= decision.required_trust_tier
AND (decision.required_data_residency is empty OR
member.endpoint.data_residency in decision.required_data_residency)
AND (for all cap in decision.required_capabilities: member.endpoint.supports[cap] == true)
3. Filter out members whose circuit breaker is open
4. Weighted-sample order [m1, m2, ...] using rng
5. If fallback_pool is not null, recursively SelectChain(fallback_pool, decision, trace_id, attempt_no)
and append to the tail (fallback must also satisfy the same constraints)
6. Truncate to max_attempts
7. Write [seed, applied_constraints] into routing_event columns # [v0.2 / v0.2.3]
8. If the post-filter set is empty -> return ErrNoCandidate; ingress turns it into 502 + audit
"routing_no_candidate_after_constraints"
Execute(chain, request):
for i, m in enumerate(chain):
try:
resp = adapter.send(m, request, timeout=pool.timeout_ms)
if resp.ok: return resp
except RetryableError [5xx, network, timeout]:
breaker(m).recordFailure()
record routing_event(attempt_no=i, error=...)
continue
except NonRetryable [4xx auth, 400 invalid]:
return error
return last_error
Replay tool: aicg routingctl replay <trace_id> rebuilds the chain from routing_event.seed and verifies that the decision is reproducible.
sony/gobreaker, per(provider, model)- Threshold: 10 consecutive failures, or > 50% failure rate within 1 minute → open
- half-open: allow 1 probe request after 30 seconds
The single authoritative definition is in §5.4.4. The routing engine's failure handling follows that state machine:
- After any token has been flushed to the agent, status cannot be changed; only inject
event: aicg.errorand close- Mid-stream provider 5xx does not auto-fallback (stitching semantics are too brittle)
- Internal retries are only allowed before the LP flushes the first token to the agent (once)
- See §5.4.4 for the state machine and the implementation note on the LP "may retry" flag
Review v0.2.1 #7: Reserve has different semantics in P0 vs P1; this must be made explicit.
| Dimension | Phase | Implementation |
|---|---|---|
| Per-API-key request rate | P0 | In-memory token bucket; over-limit returns 429 |
| Per-team monthly $ | P0 | Reserve / Commit / Release interfaces land; outstanding recorded for CLI reports; non-blocking; soft 80% only writes audit_event + red flag in aicg stats output (P0 does not fire a webhook — the webhook subsystem ships in §18 P1) |
| Per-team monthly $ | P1 | Reserve returns ErrHardCapExceeded when over the hard cap; ingress translates that into require_approval; soft 80% also fires the webhook |
| Per-user daily $ | same | same |
| Per-provider concurrent | P0 | Semaphore (prevent blowing through a provider's quota) |
Estimation formula (same from P0)
estimated_cents
= input_tokens x pricing.input_cents_per_mtok / 1_000_000
+ max_output_tokens x pricing.output_cents_per_mtok / 1_000_000
max_output_tokens uses the request's explicit value; if unspecified, use the pool default (a conservative upper bound).
Settlement
- Normal completion: routing succeeds → cost extractor computes
actual_cents→Commit(reservation_id, actual_cents); the delta(estimated - actual)is returned automatically - Provider failure:
Release(reservation_id), full refund - Client disconnect:
Releaseis triggered in actx.Done()defer - Process crash: a background settler scans
created_at < now - 10min && state == reservedevery minute →Release+ alert (Reserve-leak alert)
Detailed schema: §12.5. Interface signatures: §16.8.
Review #6: v0.1's IR directly referenced
anthropic.MessageParamand similar SDK types, binding us to a specific SDK version and reproducing the CCR implicit-mutate pitfall. v0.2 introduces our ownir.*types. Anthropic SDK types only appear inside theprovider/anthropic.goadapter.
package ir
// Semantically "Anthropic-Messages-API-shaped" but with independent fields - no SDK types.
type Message struct {
Role Role // user|assistant|tool
Content []ContentBlock
}
type Role string
const (
RoleUser Role = "user"
RoleAssistant Role = "assistant"
RoleTool Role = "tool" // tool_result uses a dedicated role in our IR
)
type ContentBlock struct {
Type BlockType
Text string // Type == Text
ToolUse *ToolUse // Type == ToolUse
ToolResult *ToolResult // Type == ToolResult
Thinking *ThinkingBlock // Type == Thinking
CacheControl *CacheControlSpec // segment-level cache marker
}
type BlockType string
const (
BlockText BlockType = "text"
BlockToolUse BlockType = "tool_use"
BlockToolResult BlockType = "tool_result"
BlockThinking BlockType = "thinking"
)
type Tool struct {
Name string
Description string
InputSchema map[string]any // JSON Schema
}
type ToolChoice struct {
Type ToolChoiceType
Name string // when Type == Specific
}
type ToolChoiceType string
const (
ToolChoiceAuto ToolChoiceType = "auto"
ToolChoiceAny ToolChoiceType = "any"
ToolChoiceSpecific ToolChoiceType = "specific"
)
type ToolUse struct {
ID string
Name string
Input map[string]any
}
type ToolResult struct {
ToolUseID string
Content []ContentBlock // usually Text only
IsError bool
}
type ThinkingBlock struct {
Text string
}
type CacheControlSpec struct {
Type string // "ephemeral"
TTL string // optional: "5m", "1h"
}
type ThinkingSpec struct {
Enabled bool
BudgetTokens int
}
type StopReason string
const (
StopEndTurn StopReason = "end_turn"
StopMaxTokens StopReason = "max_tokens"
StopToolUse StopReason = "tool_use"
StopStopSeq StopReason = "stop_sequence"
)// internal/gw/provider/adapter.go
package provider
import "agentgate/internal/shared/ir"
type IRRequest struct {
Model string
Messages []ir.Message
System []ir.ContentBlock // system may contain cache_control
Tools []ir.Tool
ToolChoice *ir.ToolChoice
MaxTokens int
Temperature *float64
Stream bool
Thinking *ir.ThinkingSpec
Metadata map[string]string
}
type IRStreamEvent struct {
Type IRStreamEventType // message_start | content_block_start | content_block_delta |
// content_block_stop | message_delta | message_stop |
// aicg_usage (internally generated)
Raw []byte // Serialized event (re-converted when emitted in the downstream wire)
// Parsed strongly-typed fields (populated per Type)
BlockIndex int
Delta *ir.ContentBlock
Usage *Usage
}
type IRResponse struct {
ID string
Model string
Content []ir.ContentBlock
StopReason ir.StopReason
Usage Usage
}
type Usage struct {
InputTokens int
OutputTokens int
CacheReadTokens int
CacheCreationTokens int
}
type Adapter interface {
// Identifier
Name() string // "anthropic", "openai", "azure_openai", "openai_compat", "anthropic_bedrock"
SupportsStreaming() bool
SupportsTools() bool
SupportsCacheControl() bool
SupportsExtendedThinking() bool
// Send non-streaming
Send(ctx context.Context, req IRRequest, member PoolMember) (*IRResponse, error)
// Send streaming; events are pre-converted back to Anthropic IR events
SendStream(ctx context.Context, req IRRequest, member PoolMember) (<-chan IRStreamEvent, <-chan error)
// Token counting (used by /v1/messages/count_tokens)
CountTokens(ctx context.Context, req IRRequest, member PoolMember) (int, error)
}
type PoolMember struct {
EndpointID string // [v0.2.1] references the provider_endpoints registry (§9.1)
Provider string // resolved from the endpoint registry; not configured separately
URL string // same as above
Model string
Deployment string // for azure_openai
Private bool // resolved from the registry
KeyRef SecretRef // env://, file://, vault://
ExtraOpts map[string]any
}Target adapter list & phase matrix [v0.2.1]
| Adapter | P0 | P1 | P2 | P3 |
|---|---|---|---|---|
anthropic (direct Anthropic API) |
✅ | |||
openai_compat (Ollama / vLLM / LM Studio / TGI / SGLang / self-hosted) |
✅ | |||
openai (direct OpenAI) |
✅ | |||
openrouter (shares the OpenAI wire) |
✅ | |||
litellm (shares the OpenAI wire) |
✅ | |||
azure_openai (Azure OpenAI Service) |
✅ | |||
anthropic_bedrock (AWS Bedrock) |
✅ |
P0 picks
anthropic+openai_compat: the former covers the main Claude Code traffic, the latter covers enterprise self-hosted and local inference at no extra transformer cost. Direct OpenAI is deferred to P1 alongside the LP OpenAI-compatible endpoint. Azure / Bedrock are enterprise hardening (P3) — they need IAM / tenant integration.
Review v0.2.3 #4: do not conflate "provider key retrieval wrapped by KMS" with the raw store's "DEK envelope encryption" — they are separate subsystems with independent phasing.
| Subsystem | Purpose | P0 | P1 | P2 | P3 |
|---|---|---|---|---|---|
| Secret resolver | At startup, resolves provider_endpoints[*].key_ref and other secret references → in-memory cleartext provider API key |
env://, file:// ✅ |
vault:// ✅ (if the enterprise already runs Vault) |
aws-secretsmanager://, gcp-secret-manager:// ✅ |
KMS-managed secret (double-wrap) ✅ |
| Raw store KMS | DEK envelope encryption to write raw prompt/response into S3 | — | — | KMS abstraction ships (aws_kms / gcp_kms / vault_transit / local_file_kek) ✅ | Multi-KMS / per-tenant / customer-managed keys ✅ |
Interface contract
// internal/shared/secretref/resolver.go
type Resolver interface {
// One-shot resolve at startup; schemes: env, file, vault, aws-secretsmanager, gcp-secret-manager
Resolve(ctx context.Context, ref string) (plaintext []byte, err error)
}P0 residual risk: with only env:// / file://, provider keys are present in cleartext in the GW process memory and the systemd unit's environment; a host compromise leaks them. Mitigations:
file://files at mode 0600 and readable only by the GW SAenv://only via systemdEnvironmentFile=, notEnvironment=(avoidsps ewwleakage)- Audit entry
secret_loaded(records only the ref, never the plaintext)
The threat-model lists this residual under TB-7 / Top-1 (see threat-model.md).
- inbound: OpenAI Chat Completions request → IR (only when the LP route is
/openai/...) - outbound: IR → target provider wire
- Same family (Anthropic→Anthropic, OpenAI→OpenAI-style): passthrough
- Cross family (Anthropic IR → OpenAI-style):
- Reshape
messagesinto OpenAI'smessages(merge system, tool_use →tool_calls, tool_result →toolrole) - Strip
cache_control;degraded_features += ["cache_control"] - Strip
thinking;degraded_features += ["extended_thinking"] - Pass
stop_sequencesthrough
- Reshape
- inbound stream: rewrite OpenAI SSE deltas into Anthropic IR
content_block_delta - count_tokens: when crossing providers, estimate locally using anthropic-tokenizer and append
estimated: true
S3 object key:
raw/tenant=<tid>/repo=<repo_id>/sensitivity=<low|med|high>/dt=YYYY-MM-DD/<trace_id>.bin
Object body layout (binary):
| magic "AICG\x01" 5B
| header_len uint32 |
| header_json (encrypted-DEK envelope:
{ dek_wrapped_b64, kek_id, alg: "AES-256-GCM", nonce_b64, redacted_fields:[...] }
) |
| ciphertext (AES-256-GCM, AAD = trace_id) |
plaintext = canonical_json({ request: <ir_request>, response: <ir_response>, sse_events: [...] })
DEK = csprng(32 bytes)
ciphertext = AES-GCM(DEK, plaintext, AAD=trace_id)
DEK_wrapped = KMS.Encrypt(KEK_id, DEK, context={tenant, repo_id})
object = magic | header_len | header_json | ciphertext
KMS interface abstraction:
type KMS interface {
Encrypt(ctx context.Context, kekID string, plaintext []byte, ctx EncContext) ([]byte, error)
Decrypt(ctx context.Context, kekID string, ciphertext []byte, ctx EncContext) ([]byte, error)
GenerateDataKey(ctx context.Context, kekID string, spec string) (plaintext, wrapped []byte, error)
}Implementations: aws_kms / gcp_kms / vault_transit / local_file_kek (dev).
Review v0.2.1 #2:
object_uri/object_size/kek_idhave no values inmetadata_onlymode. Make them nullable + use astorage_policycolumn + CHECK constraint for consistency.
-- raw_record (Postgres) — single table covering metadata_only / redacted_only / full
CREATE TABLE raw_record (
trace_id UUID PRIMARY KEY,
tenant_id TEXT NOT NULL,
user_id TEXT NOT NULL,
team_id TEXT NOT NULL,
repo_id TEXT,
sensitivity TEXT CHECK (sensitivity IN ('low','medium','high','unknown')),
storage_policy TEXT NOT NULL CHECK (storage_policy IN
('metadata_only','redacted_only','full','disabled')),
-- v0.2.1: the three columns below are filled only for redacted_only / full
object_uri TEXT,
object_size INTEGER,
kek_id TEXT,
redacted_fields JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
expires_at TIMESTAMPTZ NOT NULL,
deleted_at TIMESTAMPTZ,
-- Consistency constraint: storage_policy and the object columns must match
CONSTRAINT raw_record_object_consistency CHECK (
(storage_policy IN ('redacted_only','full')
AND object_uri IS NOT NULL AND object_size IS NOT NULL AND kek_id IS NOT NULL)
OR
(storage_policy = 'metadata_only'
AND object_uri IS NULL AND object_size IS NULL AND kek_id IS NULL)
)
);
CREATE INDEX ON raw_record (tenant_id, repo_id, created_at DESC);
CREATE INDEX ON raw_record (expires_at) WHERE deleted_at IS NULL;
-- disabled mode does not write this table at all (egress pipeline skips)Column-state matrix
storage_policy |
object_uri | object_size | kek_id | redacted_fields | egress behavior |
|---|---|---|---|---|---|
metadata_only (P0/P1 default) |
NULL | NULL | NULL | NULL | Write metadata row only |
redacted_only (P2+) |
filled | filled | filled | filled | redact → DEK encrypt → S3 |
full (P2+) |
filled | filled | filled | NULL | DEK encrypt → S3 |
disabled |
— | — | — | — | Skip the table entirely |
Review #9: v0.1's 30-day full raw default was both a sales blocker and a compliance blocker. v0.2 defaults to
metadata_only;redacted_only/fullmust be opted in per repo. MVP P0 does not deploy KMS / object storage (see §18).
- Default
default_storage_policy: metadata_only(only writes theraw_recordmetadata row; no S3 object; no KMS call) - Default TTL 30 days (metadata rows)
modevalues:
| mode | Behavior |
|---|---|
metadata_only |
raw_record row only; no S3 object |
redacted_only |
safety.redact_engine replaces placeholders pre-store; DEK encrypted to S3 |
full |
Full prompt + response; DEK encrypted to S3 |
disabled |
No write at all |
storage:
default_storage_policy: metadata_only # [v0.2]
default_retention_days: 30
repo_overrides:
repo_payments_core:
mode: disabled # extremely sensitive: keep nothing
repo_eval_corpus:
mode: full # explicit opt-in for evaluation
retention_days: 90
repo_debug_pool:
mode: redacted_only
retention_days: 14Minimum conditions to upgrade raw-capture capability
-
mode != metadata_onlyrequires: (a) the KMS provider referenced inpools.yamlis configured; (b) the object-storage bucket is verified writable; otherwise the startup validation fails. -
Deployment check:
aicg storagectl validatereports each repo's effective mode + the status of physical dependencies. -
Background job (
gc_worker) scansexpires_at < now()hourly; calls KMSScheduleKeyDeletionif supported, deletes S3 objects, soft-deletes metadata rows, writesgc_audit.
developer: can access raw under their own user_id (non-high sensitivity)team_admin: same team; high-sensitivity defaults to a redacted view, needs break-glassplatform_admin: fully visible; every access writesraw_access_audit
CREATE TABLE raw_access_audit (
id BIGSERIAL PRIMARY KEY,
accessed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
accessor_user TEXT NOT NULL,
accessor_role TEXT NOT NULL,
target_trace_id UUID NOT NULL,
target_repo_id TEXT,
purpose TEXT, -- filled by the caller (debug, audit, eval, ...)
break_glass BOOLEAN NOT NULL DEFAULT FALSE,
ip INET,
user_agent TEXT
);mode: disabled→ egress pipeline skipsraw_store.put; no metadata row eithermode: redacted_only→ before encryption,safety.redact_enginereplaces identified secrets / PII with placeholders- audit still records everything (audit ≠ raw)
Review v0.2.1 #8: Phase 0 does not deploy a hash chain, but the schema with
self_hash NOT NULLblocked writes. Make it nullable + enable per phase.
-- Always append-only; the hash chain only turns on in P3
CREATE TABLE audit_event (
id BIGSERIAL PRIMARY KEY,
event_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
trace_id UUID NOT NULL,
session_id TEXT,
tenant_id TEXT NOT NULL,
user_id TEXT NOT NULL,
team_id TEXT NOT NULL,
repo_id TEXT,
event_type TEXT NOT NULL,
decision JSONB, -- v0.2 three-slot Decision snapshot
rule_ids TEXT[],
detail JSONB,
request_summary JSONB, -- envelope + classifier (no prompt text)
routed_to TEXT, -- "endpoint_id:model"
fallback_chain JSONB,
error_code TEXT,
-- chain (filled from P3; NULL allowed in P0/P1)
prev_hash BYTEA, -- [v0.2.1] nullable; not written before P3
self_hash BYTEA -- [v0.2.1] nullable; not written before P3
);
CREATE INDEX ON audit_event (trace_id);
CREATE INDEX ON audit_event (tenant_id, team_id, event_at DESC);
CREATE INDEX ON audit_event (tenant_id, repo_id, event_at DESC);
-- P3 migration: when enabling the chain there is no need to backfill history (P0/P1 rows keep NULL hashes; forward-only chain from P3)
-- After audit.chain.enabled=true at startup, BEFORE INSERT trigger writes self_hash on new inserts
-- The audit_chain_root table starts receiving daily roots at the same timePhase agreements
| Phase | self_hash | prev_hash | Periodic root job |
|---|---|---|---|
| P0 / P1 / P2 | NULL | NULL | Not running |
| From P3 | sha256(canonical(row)) | previous row's self_hash | Every day at 00:05 UTC, compute Merkle root by (tenant_id, id asc); write audit_chain_root + a signed copy to S3 at audit-chain/<date>/<tenant>.json |
Historical data (NULL hashes) does not participate in the chain, but pre-P3 audit can still serve as a compliance archive; for integrity tracing, combine with backup reconciliation.
Review v0.2.1 #3: P0 does not deploy TimescaleDB; the schema must first land as plain tables, then P1 migrates to hypertables.
CREATE TABLE cost_event (
id BIGSERIAL PRIMARY KEY,
event_at TIMESTAMPTZ NOT NULL,
trace_id UUID NOT NULL,
attempt_no SMALLINT NOT NULL, -- 1=primary, 2..=fallback
tenant_id TEXT NOT NULL,
user_id TEXT NOT NULL,
team_id TEXT NOT NULL,
repo_id TEXT,
task_type TEXT,
policy_rule_id TEXT,
provider TEXT NOT NULL,
endpoint_id TEXT NOT NULL, -- [v0.2.1] references provider_endpoints
model TEXT NOT NULL,
pool TEXT NOT NULL,
is_private BOOLEAN NOT NULL,
input_tokens INTEGER NOT NULL,
output_tokens INTEGER NOT NULL,
cache_read_tokens INTEGER DEFAULT 0,
cache_create_tokens INTEGER DEFAULT 0,
cost_cents INTEGER NOT NULL,
cost_source TEXT NOT NULL, -- 'provider_usage' | 'estimated'
latency_ms INTEGER,
success BOOLEAN NOT NULL,
error_class TEXT
);
CREATE INDEX ON cost_event (event_at DESC);
CREATE INDEX ON cost_event (tenant_id, team_id, event_at DESC);
CREATE INDEX ON cost_event (tenant_id, user_id, event_at DESC);
CREATE INDEX ON cost_event (tenant_id, repo_id, event_at DESC);
CREATE INDEX ON cost_event (provider, model, event_at DESC);The P0 dashboard aggregations run direct SQL (GROUP BY day); at internal-dogfood volume (< 1M rows/month) there is no bottleneck.
-- 0001_p1_timescale_migration.sql
-- 1. Enable extension
CREATE EXTENSION IF NOT EXISTS timescaledb;
-- 2. Convert to hypertable (migrate_data does the online migration)
SELECT create_hypertable('cost_event', 'event_at',
chunk_time_interval => INTERVAL '7 days',
migrate_data => true);
-- 3. Continuous aggregate
CREATE MATERIALIZED VIEW cost_daily
WITH (timescaledb.continuous) AS
SELECT
time_bucket('1 day', event_at) AS day,
tenant_id, team_id, user_id, repo_id, provider, model, pool, task_type,
SUM(input_tokens) AS input_tokens,
SUM(output_tokens) AS output_tokens,
SUM(cost_cents) AS cost_cents,
COUNT(*) AS n_requests,
SUM(CASE WHEN success THEN 0 ELSE 1 END) AS n_failed
FROM cost_event
GROUP BY day, tenant_id, team_id, user_id, repo_id, provider, model, pool, task_type;
SELECT add_continuous_aggregate_policy('cost_daily',
start_offset => INTERVAL '7 days',
end_offset => INTERVAL '1 hour',
schedule_interval => INTERVAL '30 minutes');-- P0: plain Postgres table
CREATE TABLE routing_event (
event_at TIMESTAMPTZ NOT NULL,
trace_id UUID NOT NULL,
attempt_no SMALLINT NOT NULL,
tenant_id TEXT NOT NULL,
decision JSONB NOT NULL, -- {primary_action, modifiers, side_effects, ...}
pool_selected TEXT,
member_selected JSONB, -- {endpoint_id, provider, model, private}
fallback_chain JSONB,
degraded_features TEXT[],
breaker_state TEXT,
seed BYTEA, -- [v0.2] sha256(trace_id:pool:attempt_no)
PRIMARY KEY (trace_id, attempt_no)
);
CREATE INDEX ON routing_event (event_at DESC);
CREATE INDEX ON routing_event (tenant_id, event_at DESC);
-- P1 migration: SELECT create_hypertable('routing_event', 'event_at',
-- chunk_time_interval => INTERVAL '7 days', migrate_data => true);Review #4: the hard cap cannot rely on after-the-fact
Consume.
CREATE TABLE budget_reservation (
id UUID PRIMARY KEY,
trace_id UUID NOT NULL,
tenant_id TEXT NOT NULL,
user_id TEXT NOT NULL,
team_id TEXT NOT NULL,
estimated_cents INTEGER NOT NULL,
actual_cents INTEGER, -- nullable until commit
state TEXT NOT NULL, -- 'reserved' | 'committed' | 'released' | 'expired'
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
settled_at TIMESTAMPTZ
);
CREATE INDEX ON budget_reservation (state, created_at);
CREATE INDEX ON budget_reservation (tenant_id, team_id, created_at DESC);Materialized view (includes the reserved portion) for the dashboard "reserved + settled" display:
CREATE VIEW budget_team_monthly_v AS
SELECT
tenant_id, team_id,
date_trunc('month', created_at) AS month,
SUM(CASE WHEN state IN ('reserved','committed') THEN
COALESCE(actual_cents, estimated_cents) ELSE 0 END) AS used_cents
FROM budget_reservation
GROUP BY tenant_id, team_id, month;Settler background job: scans state='reserved' AND created_at < now() - INTERVAL '10 minutes' every minute → auto Release + write audit reservation_expired (also raises an alert — indicating a Reserve leak somewhere in the GW path).
CREATE TABLE approval_request (
id UUID PRIMARY KEY,
trace_id UUID NOT NULL,
tenant_id TEXT NOT NULL,
requestor TEXT NOT NULL,
team_id TEXT NOT NULL,
repo_id TEXT,
decision_snapshot JSONB NOT NULL,
state TEXT NOT NULL, -- pending|approved|rejected|expired
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
resolved_at TIMESTAMPTZ,
resolver TEXT,
note TEXT
);All RESTful + JSON; streaming interfaces use SSE. Every response carries
X-AICG-Trace-Id.
Review #8: v0.1 had three concurrent shapes: (a) the client API-key system, (b) email + password login, and (c) the threat-model line "MVP password + bcrypt only". v0.2 collapses these into single API key + RBAC.
GET /api/v1/whoami # echo: {user_id, role, team_id}
POST /api/v1/admin/invites # platform_admin mints invites (role: developer|team_admin|platform_admin)
POST /api/v1/admin/users/{id}/role # change role
GET /api/v1/admin/users
GET /api/v1/me/api-keys # current user's key list (returns last4 + created_at only)
POST /api/v1/me/api-keys # self-mint a key (plaintext returned once)
DELETE /api/v1/me/api-keys/{id} # revoke
POST /api/v1/admin/users/{id}/api-keys/revoke-all # platform_admin emergency revoke
Bootstrap: GW auto-generates a one-shot setup_token on first start (written to stderr and /var/lib/agentgate/setup-token.txt); admins use:
aicg login --gateway https://gw.internal --setup <token> # [v0.2.4] CLI uses space-separated flag values
Exchanges for the first platform_admin API key; setup_token invalidates immediately. Subsequent admin / user provisioning all goes through the invite flow.
Dashboard: uses the same API key as Authorization: Bearer; the UI carries it via an HttpOnly + SameSite=Strict cookie; there is no separate email/password login.
No logout endpoint: revocation = revoke the API key. After Phase 1 introduces OIDC device-code, the session concept will be added.
GET /api/v1/cost/summary
?dim=team|user|repo|model|provider|task_type|policy_rule
&group_by=day|week|month
&from=...&to=...&team_id=...&repo_id=...
-> [{ bucket, dim_value, cost_cents, input_tokens, output_tokens, n_requests, n_failed }]
GET /api/v1/cost/breakdown
?trace_id=... | ?session_id=...
-> Single trace's fallback chain and per-attempt cost
GET /api/v1/routing/recent
?team_id=...&limit=...
-> Most recent routing events with degraded_features
GET /api/v1/budget
-> Current budget state: team monthly / user daily (with soft/hard thresholds and usage)
GET /api/v1/audit/search
?trace_id=... | ?user_id=... | ?repo_id=... | ?event_type=... | ?from=...&to=...
&cursor=...&limit=50
-> { items:[audit_event], next_cursor }
GET /api/v1/audit/{trace_id}
-> Event chain + routing + cost + raw reference for one trace (raw visibility per RBAC)
GET /api/v1/audit/chain/verify?date=YYYY-MM-DD
-> Verify a day's hash-chain root
GET /api/v1/raw/{trace_id}?purpose=debug&break_glass=false
-> Decrypted output (auto redacted view based on role / sensitivity)
side-effect: writes raw_access_audit
GET /api/v1/approvals?state=pending
POST /api/v1/approvals/{id}/approve { note }
POST /api/v1/approvals/{id}/reject { note }
GET /sse/v1/approvals/stream # Long-poll SSE; pushes pending changes
GET /api/v1/policy/current # Parsed YAML (platform_admin only)
GET /api/v1/pools/current
POST /api/v1/policy/reload # Trigger hot reload (platform_admin only)
GET /api/v1/policy/decision/explain # Simulate decision: pass envelope + classifier, get Decision + matched rules
GET /api/v1/webhooks
POST /api/v1/webhooks # Create an outbound webhook
DELETE /api/v1/webhooks/{id}
POST /api/v1/webhooks/{id}/test
POST /api/v1/lp/exchange-invite # body: { invite_token, machine_id }
# Returns long-lived user API key
GET /api/v1/lp/policy-snippet # LP pulls a hash-cached policy summary applicable to the machine
POST /api/v1/repo/bind # body: { remote_url, head_sha, machine_id }
# Returns a signed binding token
GET /api/v1/lp/version-check
| Layer | Choice | Rationale |
|---|---|---|
| Gateway primary language | Go 1.22+ | Single binary, concurrency model, mature KMS/AWS/GCP SDKs, native cel-go |
| LP primary language | Go (same as GW) | Cross-platform single binary; reuse transformer / scan / wire packages |
| HTTP routing | chi/v5 |
Lightweight, composable, middleware-friendly |
| Streaming | net/http + r/w.Flusher |
No additional SSE library |
| Policy expression | cel-go |
YAML + CEL is the industry de-facto standard |
| Config hot reload | fsnotify + atomic double-buffer |
|
| Secret scan | gitleaks ruleset fork (Go-native) + detect-secrets (gRPC sidecar, Python) |
Direct gitleaks Go calls; detect-secrets has broader coverage |
| Tokenizer | tiktoken-go (OpenAI) + anthropic-tokenizer (Anthropic) |
|
| Provider SDK | Official Anthropic / OpenAI SDKs; others via OpenAI-compatible HTTP | |
| Postgres driver | pgx/v5 |
Performance, connection pooling |
| TimescaleDB | Postgres extension | Same instance as the main DB |
| Object storage | S3-compatible (aws-sdk-go-v2 s3) | MinIO / AWS S3 / GCS (via the S3-compatible layer) |
| KMS abstraction | aws-sdk-go-v2 kms / gcp kms / vault api / local file | Unified interface |
| Circuit breaker | sony/gobreaker |
|
| Rate limit | golang.org/x/time/rate |
In-memory token bucket |
| Logging | zerolog or slog (stdlib 1.22+) |
Structured logs |
| Tracing | OpenTelemetry SDK + jaeger/otlp | Optional, BYO at the enterprise |
| Metrics | Prometheus client_golang | /metrics endpoint |
| Migration | golang-migrate/migrate |
Auto-runs at startup |
| Dashboard | Next.js 14 (app router) + Tailwind + shadcn/ui | Static export embedded in the GW binary (embed.FS) |
| Tests | go test + testcontainers-go (Postgres + MinIO) |
|
| Local dev | docker-compose (gw + postgres+ts + minio + ollama optional) | |
| Container | Distroless static | Minimal single-binary image |
Alternative: if the team has materially stronger TypeScript experience (CCR fork background), the LP could be in TS (keep single-process dual endpoints + Node fastify); the GW is still strongly recommended to be Go (multi-process streaming + KMS / object storage + performance).
agentgate/
├── README.md
├── LICENSE
├── go.mod
├── go.sum
├── Makefile
├── docker-compose.yaml # Local dev: gw + postgres-ts + minio
├── Dockerfile
├── docs/
│ └── architecture/
│ ├── SYSTEM-DESIGN.md # this file
│ ├── envelope-schema.md
│ ├── policy-cookbook.md
│ └── threat-model.md
│
├── cmd/
│ ├── aicg-gw/
│ │ └── main.go # gateway entry
│ └── aicg-lp/
│ └── main.go # local proxy entry
│
├── internal/
│ ├── shared/ # shared between GW + LP
│ │ ├── envelope/ # Metadata envelope schema
│ │ ├── ir/ # Anthropic IR types
│ │ ├── transformer/ # IR ↔ provider wire
│ │ │ ├── anthropic.go
│ │ │ ├── openai.go
│ │ │ └── degraded.go
│ │ ├── tokenizer/
│ │ ├── scan/ # gitleaks rules wrapper + redact engine
│ │ ├── wire/ # SSE helpers, JSON streaming
│ │ ├── secretref/ # env://, file://, vault:// resolution
│ │ └── version/
│ │
│ ├── lp/ # local proxy only
│ │ ├── server/
│ │ │ ├── anthropic.go # /anthropic/* handler
│ │ │ ├── openai.go # /openai/* handler
│ │ │ └── meta.go # /_aicg/* handler
│ │ ├── tagger/ # heuristic v0
│ │ ├── session/
│ │ ├── repobinder/
│ │ ├── gwclient/ # mTLS, retry, timeout
│ │ ├── prescan/
│ │ ├── localconfig/ # ~/.aicg/*
│ │ └── cli/
│ │ ├── login.go
│ │ ├── start.go
│ │ ├── status.go
│ │ ├── bind.go
│ │ ├── doctor.go
│ │ └── env.go
│ │
│ └── gw/ # gateway only
│ ├── edge/
│ │ ├── tls.go
│ │ ├── ratelimit.go
│ │ └── tracing.go
│ ├── auth/
│ │ ├── apikey.go
│ │ ├── rbac.go
│ │ └── invites.go
│ ├── repobinding/
│ ├── safety/
│ │ ├── input_scanner.go
│ │ ├── output_scanner.go
│ │ └── redactor.go
│ ├── classifier/ # server-side reclassifier
│ │ └── server_reclassifier.go
│ ├── policy/
│ │ ├── loader.go # YAML + fsnotify
│ │ ├── engine.go # CEL decisions
│ │ ├── conflict.go # priority + deny-overrides
│ │ └── decision.go
│ ├── routing/
│ │ ├── pools.go
│ │ ├── selector.go
│ │ ├── breaker.go
│ │ └── execute.go
│ ├── provider/
│ │ ├── adapter.go # interface
│ │ ├── anthropic.go
│ │ ├── anthropic_bedrock.go
│ │ ├── openai.go
│ │ ├── azure_openai.go
│ │ ├── openrouter.go
│ │ ├── litellm.go
│ │ └── openai_compat.go # ollama / vllm / tgi
│ ├── budget/
│ ├── cost/
│ │ ├── pricing.go # Pricing YAML loader
│ │ ├── extractor.go
│ │ └── estimator.go
│ ├── audit/
│ │ ├── writer.go
│ │ └── chain.go # Hash chain periodic job
│ ├── rawstore/
│ │ ├── store.go
│ │ ├── kms.go # KMS interface
│ │ ├── kms_aws.go
│ │ ├── kms_gcp.go
│ │ ├── kms_vault.go
│ │ ├── kms_localfile.go
│ │ └── gc_worker.go
│ ├── webhook/
│ ├── approval/
│ ├── dashboard/
│ │ ├── api.go # REST API routes
│ │ ├── sse.go
│ │ └── ui_embed.go # embed.FS static assets
│ ├── db/
│ │ ├── migrations/
│ │ ├── postgres.go
│ │ └── timescale.go
│ └── server/ # main wiring
│ ├── server.go
│ ├── ingress_pipeline.go
│ ├── egress_pipeline.go
│ └── handlers.go
│
├── configs/
│ ├── policies/
│ │ └── main.yaml
│ ├── pools.yaml
│ ├── budgets.yaml
│ ├── identity/
│ │ ├── users.yaml # user_id, email, team_id
│ │ ├── teams.yaml
│ │ └── repos.yaml # repo_id, owners, sensitivity
│ ├── webhooks.yaml
│ └── pricing.yaml
│
├── scripts/
│ ├── dev/
│ ├── load-test/
│ └── policy-lint/
│
├── tests/
│ ├── e2e/
│ ├── conformance/
│ │ ├── claude_code/ # Spin up a Claude Code simulator and run end-to-end
│ │ └── openai_clients/ # cursor/aider style
│ └── fixtures/
│
└── ui/ # Next.js dashboard
├── app/
├── components/
├── lib/
└── package.json
;; HTTP request: POST /v1/agent/forward
Authorization = "Bearer" SP user-api-key ; user-api-key = 256-bit base64
X-AICG-LP-Version = semver
X-AICG-Session-Id = 64HEXDIG
X-AICG-Repo-Binding = base64( ed25519-signed CBOR { repo_id, machine_id, exp } )
Accept = "text/event-stream" / "application/json"
;; HTTP request body
body = JSON {
"envelope": AICGEnvelope,
"wire": {
"protocol": "anthropic_messages" / "openai_chat_completions",
"stream": bool,
"body": <upstream-original-body>
}
}
package policy
type Engine interface {
Decide(ctx context.Context, in DecideInput) (Decision, error)
Reload(ctx context.Context) error
CurrentVersion() string
}
type DecideInput struct {
Envelope envelope.AICGEnvelope
ServerClass classifier.Output // task_type, complexity, sensitivity (server-trusted)
User auth.User
Team auth.Team
Repo auth.Repo
Budget budget.Snapshot
TraceID string // used to derive rand() deterministically
}
type Decision struct {
PrimaryAction string // [v0.2] single-pick: allow|block|route|route_to_private_model|require_approval
Modifiers []string // [v0.2] redact|escalate_to_strong_model
SideEffects []string // [v0.2] shadow_eval|log_only
ModelPool string
ShadowPool string
Redactions []scan.Redaction
Reasons []string // hit rule_ids
Explanation string // human readable
// [v0.2.3] Endpoint constraints — RoutingEngine uses these to filter candidate members
RequiredTrustTier string // "" | "vendor" | "partner" | "private"
RequiredDataResidency []string // allowed set; empty = unrestricted
RequiredCapabilities []string // required supports.* keys
DegradedFeatures []string // populated during routing
RequireApprovalID string
}package routing
type Engine interface {
// [v0.2.3] BuildChain filters the registry by Decision's RequiredTrustTier / DataResidency / Capabilities
BuildChain(pool string, decision policy.Decision, traceID string, attemptNo int) ([]Member, error)
Execute(ctx context.Context, req ir.Request, chain []Member, traceID string) (Result, error)
ExecuteStream(ctx context.Context, req ir.Request, chain []Member, traceID string) (StreamResult, error)
}
// [v0.2.3] Error: no candidate after constraint filtering
var ErrNoCandidate = errors.New("routing: no candidate after constraint filter")
type Result struct {
Response *ir.Response
Member Member
AttemptNo int
UsedFallbacks []FallbackHop
Latency time.Duration
Degraded []string
}See §10.
package rawstore
type Store interface {
Put(ctx context.Context, in PutInput) (PutOutput, error)
Get(ctx context.Context, traceID string, role auth.Role, purpose string, breakGlass bool) (Plaintext, error)
Delete(ctx context.Context, traceID string) error
}
type PutInput struct {
TraceID string
TenantID, UserID, TeamID, RepoID string
Sensitivity string
Plaintext []byte // canonical_json(request+response+events)
StoragePolicy string // 'full', 'redacted_only', 'disabled'
Redactions []scan.Redaction
RetentionDays int
}See §11.2.
package audit
type Writer interface {
Append(ctx context.Context, ev Event) error
}
type ChainSigner interface {
SealDay(ctx context.Context, date time.Time, tenantID string) (Root, error)
Verify(ctx context.Context, date time.Time, tenantID string) (bool, Root, error)
}package budget
type ReservationID string
type Service interface {
// Query current budget state during decision (includes the reserved portion)
Check(ctx context.Context, q Query) (Snapshot, error)
// Reserve before routing; over hard cap returns ErrHardCapExceeded
Reserve(ctx context.Context, q Query, estimatedCents int) (ReservationID, error)
// On upstream success → replace estimated with actualCents; delta auto-released
Commit(ctx context.Context, id ReservationID, actualCents int) error
// On upstream failure / client disconnect → full release
Release(ctx context.Context, id ReservationID) error
}
type Snapshot struct {
TeamMonthlyUsedCents int // committed
TeamMonthlyReservedCents int // outstanding reservations
TeamMonthlyCapCents int
UserDailyUsedCents int
UserDailyReservedCents int
UserDailyCapCents int
SoftHit bool // (used + reserved) >= 80% cap
HardHit bool // (used + reserved) >= 100% cap
}
// Background settler interface
type Settler interface {
// Periodically cleans up Reservations that have not been committed/released for longer than (default 10min) → Release + alert
SweepExpired(ctx context.Context, ttl time.Duration) (releasedCount int, err error)
}| Risk | Impact | Mitigation |
|---|---|---|
| Anthropic policy risk (unauthorized wire compatibility) | Legal / commercial bans | Do not replace the Anthropic API key; GW is enterprise self-hosted with the enterprise-owned API key; traffic does not leave the enterprise boundary (except for legitimate upstream calls) |
| Enterprise reluctance to install a client | Onboarding friction | Single-binary LP + brew/npm/scoop channels + aicg doctor one-shot self-check; provide zero-touch deploy mode (admin pushes ~/.aicg/credentials via MDM) |
| Untrusted client-side tags | Security policy bypass | All security/routing decisions are based on the server reclassifier; client tags only act as a confidence prior |
| Streaming mid-failure fallback distortion | Poor client experience | MVP simply breaks the stream + LP auto-retries once; no seamless stitching |
| Cross-provider feature loss (cache_control, thinking) | Cost / effectiveness regression | degraded_features[] lands in routing_event; the dashboard highlights explicitly; admins can use policy to forbid cross-provider routing for specific task_types |
| Object storage + KMS failure | Blocks audit / raw writes | Raw writes are best-effort (async queue + bounded retry + DLQ); audit must succeed synchronously; KMS unavailable → fail-closed denies raw writes but requests still return normally |
| gitleaks false-positive rate | Bad developer experience | Client pre_scan uses loose rules (high sensitivity → hint only); the server uses strict rules (decision authority); aicg policy decision/explain for self-service investigation |
| Policy reload misconfigured syntax | Everyone gets 5xx | Reload-failure keeps the old version + alert webhook; the aicg policyctl validate CLI runs before admin push |
| Provider key leakage | Disaster | Secret reference schemes (vault:///env:///file://); a plaintext key in git is blocked by lint |
| Streaming post-hoc scan missing detections | Compliance risk | Dual-engine (gitleaks + detect-secrets); the output scanner carries a rule version + periodic replay of historical raw rescans; hits fire webhook + dashboard pin |
| Hash chain bypassed (admin edits DB directly) | Audit integrity | Daily root written to S3 + signed + optional forward-to external storage (write-only); platform_admin operations themselves write audit |
| Multi-provider pricing inaccuracy | Financial reconciliation drift | Provider usage fields are the authoritative source; local models use tokenizer estimation marked cost_source=estimated; monthly reconciliation with the provider's invoice landed in Phase 2 |
| LP / GW version incompatibility | Client crash | The envelope schema_version uses strict SemVer; GW supports the current and previous major; aicg doctor reports version skew |
| MVP without SSO; admin user/team drift | Cost attribution errors | users.yaml / teams.yaml in git; every change is a PR; a reconcile job periodically checks for "orphan traces" (no user/team mapping) |
| CCR mindshare | Users ask "why not CCR" | Documentation makes the differences explicit: enterprise governance, policy, audit, cost attribution, security; the OSS plan keeps the LP lightweight; developer experience aligned with CCR |
| Conflict with existing CCR env vars | Concurrent use anomalies | aicg doctor detects an already-occupied ANTHROPIC_BASE_URL and prompts; offers aicg env --check-conflicts |
Review #1: v0.1 packed 6–9 months of work into Phase 0 — unreachable. v0.2 re-tiers. Each phase must independently deliver value and serve as ROI justification for the next.
Core deliverables
- LP: Anthropic endpoint only (
/anthropic/v1/messages+count_tokens) - LP: local config / repo auto-bind /
aicg login/start/status/doctor/bind-repo - LP: heuristic tagger v0
- GW: edge + auth (API key + RBAC) + repo_binding validation
- GW: server reclassifier (lightweight rules version)
- Policy engine: YAML + CEL + hot reload; only the
routeaction (no block/redact/require_approval/shadow_eval) - Routing engine: 4 pools + Anthropic and 1 OpenAI-compatible adapter (pick one of Ollama/vLLM)
provider_endpointsregistry validation (startup + reload reject unregistered endpoint_id)- Three Postgres tables:
audit_event/cost_event/routing_event(TimescaleDB postponed to P1) - Pricing YAML + cost extractor (based on provider usage fields)
- Budget service:
Checkimplemented;Reserve/Commit/Releaseinterfaces land but cap is soft-warn only (non-blocking) - CLI reports:
aicg stats --by team|user|repo|model --from --to
Out of scope: dashboard UI, webhook, KMS, object storage, security scanning, approval, shadow_eval, TimescaleDB, additional providers (Bedrock/Azure/OpenRouter/LiteLLM, etc.), hash chain.
Exit criteria: one internal dogfood team is able to produce a weekly cost report + replayable decisions (trace_id replay). SLO: p95 GW latency < 150ms (upstream excluded); availability 99%.
- LP: second endpoint (
/openai/v1/chat/completions) + transformer pipeline across wires - Routing engine: fallback chain + circuit breaker
- Budget: Reserve/Commit/Release truly enforces the hard cap
- TimescaleDB migration + continuous aggregates
- Input pre-scan v0 (gitleaks single engine) + policy
block/redactactions ship - Policy engine:
require_approvalaction + approval queue + generic webhook notification - Minimal dashboard: cost views + routing recent + approval queue
- Outbound webhooks (alert / approval / violation) with HMAC + URL allowlist
Exit criteria: ≥3 teams onboard with at least 1 enterprise customer pilot; core policy enforcement is demoable (block / redact / approval); MTTR < 30 min (policy-error rollback).
- Raw store + KMS abstraction (AWS KMS / local file KEK first); default still
metadata_only, opt-in per repo - access_audit + three-layer RBAC (developer / team_admin / platform_admin)
- shadow_eval loop: dual emit to a table + simple scorecard
- More providers: OpenRouter / LiteLLM / Azure OpenAI
- Policy preview (dashboard): decision simulation + decision diff
- LP binary signature verification + self-check upgrade hints
Exit criteria: raw capture usable in production at ≥1 enterprise; shadow_eval data usable to drive P3 recommendations.
- Audit hash chain + daily-root dual write
- Second scan engine (detect-secrets sidecar)
- Anthropic Bedrock / private-domain endpoints /
private_strongpool truly online - OIDC device-code (SSO entry point)
- Post-hoc output scan (post-stream + alert)
- KMS multi-provider full support (GCP KMS, Vault Transit)
- LP platform-native key management (macOS Keychain / Windows DPAPI)
- Full SSO / SAML / SCIM
- SOC2-ready immutable audit (WORM object storage + third-party notarization)
- Multi-region / data residency
- BYO KMS + customer-managed keys
- Enterprise RBAC (ABAC)
- Multi-tenant SaaS shape
- Deep IDE integration (VS Code / JetBrains)
- Private inference cluster orchestration
- DLP hardening (third-party integration)
Why does P1 bring back input pre-scan + block/redact instead of deferring to P2 as suggested by review?
AgentGate's core narrative is "governance". Pure cost routing in P0 already overlaps with LiteLLM++/CCR forks; P1 must demonstrate "governance" to anchor enterprise sales mindshare. pre-scan v0 with gitleaks single engine is cost-controlled (≤2 weeks of work).
Why does raw store + KMS land in P2 instead of all in P0 as v0.1 had it?
After v0.2 defaults to
metadata_only(review #9), P0/P1 no longer needs KMS or object storage. Raw is high-value, low-urgency — deferring frees engineering capacity for the P0 core closed loop.
Principle: short-lived / industry-standard → hardcode; replaceable / compliance-driven / customer-specific → abstract.
| Item | Reason |
|---|---|
| Internal IR = Anthropic Messages API | The only controllable complexity; multi-IR is a disaster |
Client wires = /anthropic/* + /openai/* two endpoints |
Industry de-facto; no third worth adding |
| 12 task-type enumerations | Already tightly coupled with policy expression; extend via custom_tags |
| Policy DSL = YAML + CEL | One expression layer is enough |
| Policy conflict algorithm = priority + deny-overrides + modifier | Do not introduce a second semantics |
| Audit field fixed schema (hash chain input) | Any field change breaks the historical chain |
| Storage object key layout | Parsing / GC / migration tools all depend on it |
| trace_id = UUIDv7 | Time-ordered + unique + industry-converging |
session_id algorithm = sha256(agent_pid + start_ts + repo) |
Client can reproduce independently |
CLI subcommand names (aicg login/start/status/bind-repo/doctor) |
User recall |
envelope schema_version field position |
Cross-version compatibility foundation |
| 4 model pools (cheap/standard/strong/private_strong) | Users already bound to policy templates |
| KMS envelope encryption (DEK + KEK) | The only evolvable + compliant scheme |
| Item | Abstraction | Replacement scenarios |
|---|---|---|
| Provider adapter | provider.Adapter interface |
New provider onboarding |
| Transformer | Per-direction transformer interface | Cross-provider feature mapping evolution |
| KMS | rawstore.KMS interface |
AWS KMS / GCP KMS / Vault Transit / local |
| Object storage | rawstore.Backend (S3-compatible) |
S3 / GCS / MinIO / Azure Blob |
| Secret reference | secretref.Resolver (env:// file:// vault://) |
Add AWS SM, GCP SM |
| Identity source | auth.UserDirectory interface (MVP=YAML, Phase 1=OIDC, Phase 3=SCIM) |
Without breaking callers |
| Approval channel | approval.Notifier interface |
webhook / slack / email / pagerduty |
| Scan engine | scan.Engine interface |
gitleaks / detect-secrets / Nightfall / Lakera |
| Pricing source | cost.Pricer interface |
YAML table / provider billing API / third party |
| Tokenizer | tokenizer.Tokenizer per provider |
tiktoken / anthropic / in-house |
| Webhook transport | webhook.Transport interface |
http / kafka / sqs |
| Tenant resolver | tenant.Resolver interface (MVP single-tenant constant) |
Future SaaS multi-tenant |
| Region locator | region.Locator interface (MVP single-region constant) |
Future data residency |
| Server reclassifier | classifier.Reclassifier interface |
Future ML classifier |
| Policy storage | policy.Source interface (MVP=fs/yaml) |
DB / remote config center |
| Metrics emitter | OpenTelemetry / Prometheus interface | Not vendor-locked |
- Multi-language SDKs: the MVP only has the LP as a single client; do not build client SDKs
- Plugin / extension mechanism: the MVP's built-in scanner is enough; do not open up external plugins
- Multi-policy-file / policy merge: the MVP uses a single
policies/main.yaml; do not introduce namespacing - Approval workflow DSL: single-step approval is enough for the MVP; no multi-stage / delegation
- Cross-cluster synchronization: the MVP is single-instance; no leader election / consensus
The minimum demoable set for P0. Each item references a file path and contract anchor for easy task pickup. All 22 items green to declare P0 done.
- R-1 Go module init (
go mod init agentgate);Makefile+Dockerfile(distroless static) - R-2
docker-compose.yaml: gw + Postgres (without TimescaleDB / MinIO / KMS) - R-3
golang-migrateintegrated;internal/gw/db/migrations/contains schema 0001 (plain Postgres, see §11.3 §12.1 §12.2 §12.3 §12.4 §12.5)
- R-4 Postgres tables:
raw_record(with CHECK constraint) /audit_event(self_hash NULL) /cost_event/routing_event/approval_request/budget_reservation/raw_access_audit/audit_chain_root - R-5
configs/pools.yamlincludes aprovider_endpointsregistry (with trust_tier/data_residency/supports); startup validates that every pool member.endpoint_id is in the registry - R-6
configs/policies/main.yamlonly containsroute/allowstyle rules;block/redact/require_approvalgo to P1 - R-7
configs/pricing.yamlincludes at least Anthropic flagship models + 1 openai_compat (cost can be 0)
- R-8
POST /v1/agent/forward(§5.4.1): parse envelope + wire; full ingress pipeline (§4 steps 9-23) - R-9 Auth middleware: API key bcrypt hash table + RBAC + setup_token bootstrap (§13.1)
- R-10 Repo binding validation: ed25519 signature verify + machine_id association (§16.1)
- R-11 Server reclassifier (lite): heuristic recomputation of task_type based on file_paths / language / diff_summary
- R-12 Policy engine: cel-go compilation + three-slot merge (§7 §8); fsnotify hot reload (reload-failure keeps the old version + alert)
- R-13 Routing engine: trace_id-derived ChaCha8 seed + weighted pick (§9.2); write
routing_event.seed - R-14 Provider adapter interface landed (§10) + two adapters:
anthropic,openai_compat - R-15 Egress: streaming SSE passthrough + inject
aicg.usageat stream end (§5.4.3); non-streaming cost via header - R-16 Budget service:
Reserve / Commit / Releaseinterfaces land (P0 soft-warn only, non-blocking); background settler scans timeouts every minute
- R-17
cmd/aicg-lpsingle binary: only exposes/anthropic/v1/messages+/_aicg/*(the OpenAI endpoint goes to P1) - R-18 CLI subcommands:
login --setup/login --invite/start/status/bind-repo/doctor/env/version - R-19 Tagger v0: file_paths/language/diff_summary heuristics (no secret pre-scan; that lands in P1)
- R-20 GW client: optional mTLS / retry budget / first-token-flush state machine (§5.4.4)
- R-21
aicg.usageSSE event parsing + local ledger (~/.aicg/traces.dbSQLite)
- R-22 Conformance test: spin up a mock Anthropic upstream + Claude Code simulator; run an end-to-end
code_editstreaming request; assert (a)cost_eventwritten; (b)routing_event.seednon-empty; (c)audit_eventwritten; (d)aicg statusreports this trace; (e)routingctl replay <trace_id>matches the original decision (evidence: tests/conformance/claude_code/e2e_dbbacked_test.go::TestE2EStreamingCodeEditDBBacked; when Docker is unavailable, fall back to the in-memory code-layer smoke test TestE2EStreamingCodeEdit)
- 1 internal dogfood team onboard for at least 1 week
- p95 GW latency (upstream excluded) < 150ms
- Availability 99% / 1 week
- Weekly report (
aicg stats --by team --from 7d) output - One full incident-replay drill (rebuild the path with
routingctl replay)
# pricing.yaml
version: 1
prices:
- provider: anthropic
model: claude-opus-4-7
input_cents_per_mtok: 1500 # $15 / 1M tokens
output_cents_per_mtok: 7500
cache_write_cents_per_mtok: 1875
cache_read_cents_per_mtok: 150
- provider: anthropic
model: claude-sonnet-4-6
input_cents_per_mtok: 300
output_cents_per_mtok: 1500
- provider: openai
model: gpt-4o
input_cents_per_mtok: 250
output_cents_per_mtok: 1000
- provider: openrouter
model: deepseek/deepseek-chat
input_cents_per_mtok: 27
output_cents_per_mtok: 110
- provider: openai_compat
endpoint: ollama-cluster
model: qwen2.5-coder:7b
input_cents_per_mtok: 0
output_cents_per_mtok: 0
cost_source_default: estimatedGET /api/v1/policy/decision/explain
body:
{
"envelope": { ... omitted ... },
"server_class": { "task_type": "code_edit", "data_sensitivity": "medium" }
}
response:
{
"decision": {
"action": "route",
"modifiers": ["shadow_eval"],
"model_pool": "standard",
"shadow_pool": "strong",
"reasons": ["P-ROUTE-002", "P-EVAL-001"]
},
"checked_rules": [
{"id": "P-SEC-001", "matched": false, "trace": "envelope.repo.repo_id not in restricted_repos"},
{"id": "P-ROUTE-002", "matched": true},
{"id": "P-EVAL-001", "matched": true, "trace": "rand()=0.031 < 0.05"}
]
}| task_type | data_sensitivity | Default pool | Covering rule |
|---|---|---|---|
| summary | low/med | cheap | P-ROUTE-001 |
| test_output | low/med | cheap | P-ROUTE-001 |
| repo_search | low/med | cheap | P-ROUTE-001 |
| file_reading | low | cheap | P-ROUTE-001 + override |
| file_reading | med | standard | P-ROUTE-002 |
| simple_edit | * | standard | P-ROUTE-002 |
| code_edit | * | standard | P-ROUTE-002 |
| planning | low/med | standard | P-ROUTE-002 |
| planning | high | private_strong | P-ROUTE-004 |
| architecture | * | strong | P-ROUTE-003 |
| debug | * | strong | P-ROUTE-003 |
| security_review | * | strong | P-ROUTE-003 |
| review | * | standard | P-ROUTE-002 |
| unknown | * | standard | defaults.on_no_match |
| any | high or restricted_repo | private_strong | P-ROUTE-004 |
| any | secret hit | block / redact | P-SEC-001 / P-SEC-002 |
End of design v0.1.