Skip to content

Commit adf228b

Browse files
authored
Merge pull request #5566 from esengine/feature/context-contract-maintenance
Add context maintenance and tool contract guards / 增加上下文维护与工具合约保护
2 parents 0eb641c + d308d0d commit adf228b

34 files changed

Lines changed: 2166 additions & 165 deletions

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,9 @@
6060
two models together (executor + planner) in separate, cache-stable sessions.
6161
- **Plugin-driven.** External tools run as subprocesses over stdio JSON-RPC
6262
(MCP-compatible). Built-in tools self-register at compile time.
63+
- **Cache-aware context maintenance.** Startup injects a small stable environment
64+
summary, stale tool output is snipped/pruned before summary compaction, and the
65+
built-in tool schema contract is documented for regression review.
6366
- **Zero-friction distribution.** `CGO_ENABLED=0` single binary; cross-compile
6467
to six targets with one command. The only dependency is a TOML parser.
6568

@@ -134,6 +137,8 @@ commands, `@` references, and two-model setup are all in the
134137
from the desktop app, then use approvals, YOLO, and commands from IM.
135138
- **[Spec](./docs/SPEC.md)** — engineering contract: architecture, registries,
136139
data types, and roadmap.
140+
- **[Tool contract](./docs/TOOL_CONTRACT.md)** — provider-visible built-in tool
141+
names, read-only flags, and schema snapshot guard.
137142
- **[Migrating from 0.x](./docs/MIGRATING.md)** — moving from the legacy
138143
TypeScript releases to the 1.0 Go rewrite.
139144
- **[Checkpoints & rewind](./docs/CHECKPOINTS.md)** — the snapshot-based edit

README.zh-CN.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@
5757
端点都只是一条配置。可选让两个模型协同(执行器 + 规划器),各自独立、缓存稳定的 session。
5858
- **插件驱动**:外部工具以子进程形式运行,通过 stdio JSON-RPC 通信(MCP 兼容);
5959
内置工具在编译期自注册。
60+
- **缓存友好的上下文维护**:启动时注入稳定的环境摘要;旧工具输出会先 snip/prune,
61+
再进入摘要 compaction;内置工具 schema 合约有文档和回归测试保护。
6062
- **零摩擦分发**`CGO_ENABLED=0` 单二进制;一条命令交叉编译到六个目标平台。
6163
唯一依赖是一个 TOML 解析库。
6264

@@ -124,6 +126,8 @@ provider key 的运行时 fallback,但仍会作为当前 workspace 范围内
124126
- **[机器人使用指南](./docs/BOT_GUIDE.zh-CN.md)** —— 桌面端连接飞书、Lark、微信
125127
Bot,以及 IM 里的审批、YOLO 和命令交互。
126128
- **[规格](./docs/SPEC.md)** —— 工程契约:架构、registry、数据类型与路线图。
129+
- **[工具合约](./docs/TOOL_CONTRACT.zh-CN.md)** —— provider 可见的内置工具名、
130+
read-only 标记和 schema 快照保护。
127131
- **[从 0.x 迁移](./docs/MIGRATING.md)** —— 从 legacy TypeScript 版本迁到 1.0 Go 重写版。
128132
- **[Checkpoints 与 rewind](./docs/CHECKPOINTS.md)** —— 基于快照的编辑安全网
129133
(Esc-Esc / `/rewind`)。

docs/GUIDE.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ reasoning_language = "auto" # visible reasoning text: auto|zh|en
6363
# subagent_models = { review = "deepseek-pro", security_review = "deepseek-pro" }
6464
auto_plan = "off" # user-level only; off|on; off keeps plan mode manual
6565
# auto_plan_classifier = "deepseek-flash" # optional; only borderline tasks call it
66+
tool_result_snip_ratio = 0.6 # shorten stale tool output before summary compaction
6667

6768
[[providers]]
6869
name = "deepseek-flash"
@@ -77,6 +78,11 @@ enabled = [] # omit/empty = all built-ins
7778
bash_timeout_seconds = 120 # foreground safety cap; set 0 for no tool-local cap
7879
mcp_call_timeout_seconds = 300 # default MCP call safety cap; per-plugin/tool overrides may raise it
7980

81+
[environment]
82+
enabled = true # inject a stable startup summary of OS, shell, and common tools
83+
# [environment.tools]
84+
# go = "/opt/homebrew/bin/go" # optional explicit trusted path; workspace-local paths are not auto-executed
85+
8086
[skills]
8187
# paths = ["~/my-skills", "../shared/skills"] # extra custom skill roots
8288
# excluded_paths = ["~/.agents/skills"] # hide convention roots without deleting folders

docs/GUIDE.zh-CN.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ reasoning_language = "auto" # 可见思考过程语言:auto|zh|en
5555
# subagent_models = { review = "deepseek-pro", security_review = "deepseek-pro" }
5656
auto_plan = "off" # 仅用户级生效;off|on;off 表示计划模式仅手动开启
5757
# auto_plan_classifier = "deepseek-flash" # 可选;只在边界任务上调用
58+
tool_result_snip_ratio = 0.6 # 在摘要 compaction 前先缩短旧工具输出
5859

5960
[[providers]]
6061
name = "deepseek-flash"
@@ -69,6 +70,11 @@ enabled = [] # 省略/为空 = 全部内置工具
6970
bash_timeout_seconds = 120 # 前台安全上限;设为 0 表示不设工具层超时
7071
mcp_call_timeout_seconds = 300 # MCP 调用默认安全上限;可用 plugin/tool 覆盖
7172

73+
[environment]
74+
enabled = true # 启动时把 OS、shell 和常见工具摘要稳定注入 prompt
75+
# [environment.tools]
76+
# go = "/opt/homebrew/bin/go" # 可选:显式可信路径;workspace 内路径不会在启动时自动执行
77+
7278
[skills]
7379
# paths = ["~/my-skills", "../shared/skills"] # 额外的自定义技能目录
7480
# excluded_paths = ["~/.agents/skills"] # 隐藏约定来源,不删除目录

docs/SPEC.md

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,9 @@ type Tool interface {
110110
(`tool.RegisterBuiltin(t)`); `tool.Builtins()` lists them.
111111
- A runtime `*Registry` is assembled per run: enabled built-ins (filtered by
112112
config) **plus** plugin-provided tools. The agent only sees the `*Registry`.
113+
- Tool schemas are canonicalized on registry insertion. The built-in contract is
114+
documented in [`TOOL_CONTRACT.md`](TOOL_CONTRACT.md) and backed by tests that
115+
compare the documented surface against the same canonical schema path.
113116
- `Execute` parses raw JSON args itself. Errors are returned, not fatal — the
114117
agent feeds them back so the model can self-correct.
115118

@@ -182,14 +185,25 @@ prefix cache-stable:
182185
Long tasks eventually fill the model's context window. Reasonix manages this with
183186
**low-frequency compaction** that respects the cache-first design:
184187

185-
- Each provider declares its `context_window` (tokens). When a turn's reported
186-
`prompt_tokens` reach `compactRatio` (default `0.8`) of that window, the
187-
executor compacts **once** before the next turn.
188-
- Compaction folds only the assistant/tool work. Every **user turn** small
189-
enough to be a brief and every **prior digest** is kept verbatim; the foldable
190-
remainder is summarized — using the executor's own provider, no tools — in
191-
place. The boundary is aligned backward off any tool result so the recent tail
192-
never begins with an orphan tool message whose `tool_calls` were summarized away.
188+
- Each provider declares its `context_window` (tokens). Context maintenance is
189+
tiered: below `agent.tool_result_snip_ratio` (default `0.6`) the session is
190+
left untouched apart from the soft notice; at the snip ratio, stale tool
191+
results before the recent tail are archived and shortened with deterministic
192+
head/tail markers; at `agent.compact_ratio` (default `0.8`) stale tool results
193+
are archived and pruned to short placeholders before any summary call; only if
194+
pruning still leaves the prompt above the threshold does summary compaction
195+
run. At `agent.compact_force_ratio` (default `0.9`), the existing forced fold
196+
may proceed even when the fold economics would normally skip it.
197+
- Tool-result snip/prune never removes messages, so assistant `tool_calls` and
198+
tool results stay paired. `KeepErrors` preserves error/blocked tool outputs,
199+
and the recent tail is not rewritten. Snipped results can later be upgraded to
200+
pruned placeholders; already-pruned results are left alone.
201+
- When summary compaction runs, it folds only the assistant/tool work. Every
202+
**user turn** small enough to be a brief and every **prior digest** is kept
203+
verbatim; the foldable remainder is summarized — using the executor's own
204+
provider, no tools — in place. The boundary is aligned backward off any tool
205+
result so the recent tail never begins with an orphan tool message whose
206+
`tool_calls` were summarized away.
193207
- The dropped originals are archived under the user config dir
194208
(`reasonix/archive/<timestamp>.jsonl`; see §5 for its per-OS location), one
195209
message per line, so the full history stays traceable.
@@ -516,6 +530,14 @@ context_window = 1000000 # tokens; harness compacts older history near this li
516530
517531
# A single-model entry still works for custom OpenAI-compatible endpoints.
518532
533+
[environment]
534+
enabled = true # inject a stable startup summary of OS, shell, and common tool versions
535+
536+
# Optional trusted executable paths shown to the model when PATH probing is not enough.
537+
# Workspace-local paths are listed but not auto-executed during startup probing.
538+
# [environment.tools]
539+
# go = "/opt/homebrew/bin/go"
540+
519541
[tools]
520542
enabled = [] # omit/empty = all built-ins
521543
bash_timeout_seconds = 120 # foreground safety cap; set 0 for no tool-local cap

docs/TOOL_CONTRACT.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# Tool Contract
2+
3+
<a href="./TOOL_CONTRACT.zh-CN.md">简体中文</a>
4+
5+
This document records the provider-visible contract for Reasonix compile-time built-in tools. It is generated from the same canonical schema path used by the runtime registry.
6+
7+
| Tool | Read-only | Description |
8+
| --- | --- | --- |
9+
| `bash` | false | Execute a command in the shell and return combined stdout/stderr. Use for builds, tests, git, package managers, etc. To search/read/list/edit/move files, prefer the dedicated tools (grep, read_file, ls, glob, edit_file, move_file) over shell grep/cat/ls/find/sed/mv/Move-Item - they behave identically on every OS. For symbol search or architecture questions, prefer LSP/read tools and targeted grep before shell commands. |
10+
| `bash_output` | true | Read new output from a background job started with bash(run_in_background=true) or task(run_in_background=true). Returns the output produced since the last bash_output call for that job, plus its status (running/done/failed/killed). Does not block. |
11+
| `code_index` | true | Lightweight built-in code symbol index. Prefer lsp_* for language semantics and installed code graph MCP tools for call graph, impact, and architecture relationships; use this as the local fallback for file outlines and symbol definition candidates, then verify with read_file or grep. |
12+
| `complete_step` | true | Record the evidence-backed completion of ONE step of an approved plan. Call it as you finish each step instead of silently moving on: it signs the step off with PROOF it is done - the verification you ran (command + result), the diff/files you changed, or a manual check. A completion with no evidence is REJECTED, so don't claim a step is done until you can show why. The host advances the task list for you when you sign off - it marks this step completed and moves the next to in_progress, so you don't need a separate todo_write to mark completions. Fields: `step` (which step - its title or number, matching the task list), `result` (what is now true/changed), `evidence` (>=1 item, each with `kind` = verification\|diff\|files\|manual and a `summary`, plus optional `command`/`paths`), and optional `notes`. |
13+
| `delete_range` | false | Delete a contiguous text range from a file using exact start/end text anchors. Each anchor must match exactly one line. Returns unified diff on success. Use for large deletions - smaller changes should use edit_file. |
14+
| `delete_symbol` | false | Delete a named symbol (function, method, type, interface, const, var) from a Go source file using AST parsing. For non-Go files, use delete_range with manual anchors. |
15+
| `edit_file` | false | Replace an exact string in a file with another. old_string must occur exactly once; add surrounding context to disambiguate. Use for targeted edits instead of rewriting the whole file. |
16+
| `glob` | true | Find files matching a glob pattern (e.g. "*.go", "internal/*/*.go", "**/*.test.ts"). Supports shell metacharacters * ? [] and the recursive ** pattern. |
17+
| `grep` | true | Search for a regular expression in a file, or recursively under a directory (skips hidden files and files matched by .gitignore). Returns matching lines as path:line:text, capped at 200 matches. |
18+
| `kill_shell` | false | Terminate a running background job (bash or task) started with run_in_background. A no-op if the job has already finished or the id is unknown. |
19+
| `ls` | true | List the entries of a directory. Directories are shown with a trailing slash; files show their byte size. Set recursive=true to list all nested files depth-first (skips .git/node_modules). |
20+
| `move_file` | false | Move or rename a file from source_path to destination_path. Creates the destination parent directory as needed. Use instead of shell mv, Move-Item, or ren for file moves so workspace confinement and file-edit permissions apply. |
21+
| `multi_edit` | false | Apply a list of edits to a single file atomically: each edit runs against the result of the previous one, all in memory; the file is rewritten only if every edit succeeds. Cheaper and safer than chaining edit_file calls - a failure in step 3 leaves the file untouched instead of half-edited. |
22+
| `notebook_edit` | false | Edit one cell of a Jupyter notebook (.ipynb). Target a cell by 0-based cell_number (or cell_id). edit_mode: "replace" (default) swaps the cell's source; "insert" adds a new cell after cell_number (use -1 to prepend at the top), taking cell_type and new_source; "delete" removes the cell. cell_type is "code" or "markdown" (required for insert). Editing a code cell clears its outputs. Prefer this over edit_file for notebooks - it keeps the JSON valid. |
23+
| `read_file` | true | Read a text file with optional line offset/limit. Output prefixes each line with its 1-based number so subsequent edit_file calls can target exact lines. Use `offset` and `limit` to page through large files; the tool reports total length and pagination hints in a trailer. |
24+
| `todo_write` | true | Record and update a structured task list for the current work. Send the COMPLETE list every call - it replaces the previous one. Use it to plan multi-step work and show progress: keep exactly one item in_progress at a time, and flip an item to completed the moment it's done (don't batch completions). Skip it for trivial single-step tasks. |
25+
| `wait` | true | Block until background jobs finish, then return each job's status and final output/answer. Use to collect the result of a task(run_in_background) or bash(run_in_background) before continuing. Omit job_ids to wait for every running job. |
26+
| `web_fetch` | true | Fetch a URL over HTTPS/HTTP and return its text content. HTML pages are reduced to readable text; JSON / plain text / markdown bodies come back verbatim. Use to read documentation pages, API responses, or source files hosted somewhere the local filesystem can't reach. |
27+
| `write_file` | false | Write content to a file at the given path (overwriting existing content). Creates parent directories as needed. |
28+
29+
## Schema Snapshot
30+
31+
The exact canonical schemas are intentionally tested in code rather than copied by hand here. Run:
32+
33+
```bash
34+
go test ./internal/tool -run TestBuiltinToolContractDocumentation
35+
```
36+
37+
The test checks that every registered built-in tool has a documented name, read-only flag, description row, and canonical schema generated by `tool.BuiltinContractEntries`.
38+
39+
## Default Full Boot Surface
40+
41+
In a default full-token boot, Reasonix sends the built-in tools above plus the
42+
session, memory, skill, subagent, LSP, install, and slash-command tools below:
43+
44+
`ask`, `explore`, `forget`, `history`, `install_skill`, `install_source`,
45+
`list_sessions`, `lsp_definition`, `lsp_diagnostics`, `lsp_hover`,
46+
`lsp_references`, `memory`, `parallel_tasks`, `read_only_skill`,
47+
`read_only_task`, `read_session`, `read_skill`, `remember`, `research`,
48+
`review`, `run_skill`, `security_review`, `slash_command`, `task`.
49+
50+
`internal/boot.TestBootToolContractMatchesProviderVisibleSurface` verifies the
51+
actual boot registry contract against the provider request, including read-only
52+
flags and canonical schemas.
53+
54+
## Token Economy Boot Surface
55+
56+
In token economy mode, Reasonix starts with the core coding/session/memory tools
57+
and the connector used to enable optional sources on demand:
58+
59+
`ask`, `connect_tool_source`, `forget`, `history`, `list_sessions`, `memory`,
60+
`read_session`, `remember`, `slash_command`.
61+
62+
Core built-in tools such as `bash`, `read_file`, `grep`, file writers, job tools,
63+
and `todo_write` remain available in economy mode and are listed in the built-in
64+
table above.

docs/TOOL_CONTRACT.zh-CN.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# 工具合约
2+
3+
<a href="./TOOL_CONTRACT.md">English</a>
4+
5+
本文记录 Reasonix 编译期内置工具的 provider-visible 合约。运行时 registry 使用同一条 canonical schema 路径;测试会校验这里列出的工具名、read-only 标记和 schema 快照不会漂移。
6+
7+
| 工具 | Read-only | 说明 |
8+
| --- | --- | --- |
9+
| `bash` | false | 执行 shell 命令并返回 stdout/stderr。构建、测试、git、包管理器等使用它;读写查找文件优先使用专用工具。 |
10+
| `bash_output` | true | 读取后台 `bash``task` job 自上次读取后的新增输出和状态。 |
11+
| `code_index` | true | 轻量内置代码符号索引;优先使用 `lsp_*` 或代码图 MCP,缺失时用它兜底。 |
12+
| `complete_step` | true | 用证据记录已批准计划中一个步骤的完成情况。 |
13+
| `delete_range` | false | 用精确 start/end 文本锚点删除文件中的连续范围。 |
14+
| `delete_symbol` | false | 用 Go AST 删除 Go 源文件中的命名符号。 |
15+
| `edit_file` | false | 将文件中的唯一精确字符串替换为另一个字符串。 |
16+
| `glob` | true | 查找匹配 glob pattern 的文件。 |
17+
| `grep` | true | 在文件或目录下按正则搜索文本。 |
18+
| `kill_shell` | false | 终止后台 `bash``task` job。 |
19+
| `ls` | true | 列出目录条目,可递归。 |
20+
| `move_file` | false | 移动或重命名文件。 |
21+
| `multi_edit` | false | 对单个文件原子应用多个编辑。 |
22+
| `notebook_edit` | false | 编辑 Jupyter notebook 的单个 cell。 |
23+
| `read_file` | true | 按可分页的行号格式读取文本文件。 |
24+
| `todo_write` | true | 记录并替换当前工作的结构化任务列表。 |
25+
| `wait` | true | 等待后台 job 完成并返回最终输出。 |
26+
| `web_fetch` | true | 通过 HTTP/HTTPS 获取 URL 文本内容。 |
27+
| `write_file` | false | 写入文件内容,必要时创建父目录。 |
28+
29+
## Schema 快照
30+
31+
完整 canonical schema 不在文档中手写,避免文档和代码手工漂移。运行:
32+
33+
```bash
34+
go test ./internal/tool -run TestBuiltinToolContractDocumentation
35+
```
36+
37+
该测试会用 `tool.BuiltinContractEntries` 校验每个内置工具都有文档行、read-only 标记、非空 description 和 canonical JSON schema。
38+
39+
## 默认 Full Boot Surface
40+
41+
默认 full-token boot 会发送上面的内置工具,并额外发送 session、memory、skill、subagent、LSP、install 和 slash-command 工具:
42+
43+
`ask`, `explore`, `forget`, `history`, `install_skill`, `install_source`,
44+
`list_sessions`, `lsp_definition`, `lsp_diagnostics`, `lsp_hover`,
45+
`lsp_references`, `memory`, `parallel_tasks`, `read_only_skill`,
46+
`read_only_task`, `read_session`, `read_skill`, `remember`, `research`,
47+
`review`, `run_skill`, `security_review`, `slash_command`, `task`.
48+
49+
`internal/boot.TestBootToolContractMatchesProviderVisibleSurface` 会校验真实 boot registry 合约和 provider request 一致,包括 read-only 标记和 canonical schema。
50+
51+
## Token Economy Boot Surface
52+
53+
token economy 模式启动时保留核心编码、session、memory 工具,以及按需启用可选来源的 connector:
54+
55+
`ask`, `connect_tool_source`, `forget`, `history`, `list_sessions`, `memory`,
56+
`read_session`, `remember`, `slash_command`.
57+
58+
`bash``read_file``grep`、文件写工具、后台 job 工具和 `todo_write` 等核心内置工具在 economy 模式下仍可用,见上方内置工具表。

0 commit comments

Comments
 (0)