Targeted CI runner with review judgment automation for UB/native-boundary PR review.
ub-review is an intelligent PR CI gate: it decides what evidence a PR needs,
runs the relevant proof, and turns the result into one concise review decision.
It is not another generic PR-commenting bot and it is not fixed-job CI. It
builds deterministic evidence packets, runs bounded BYOK model lanes when
configured, validates inline comments, and submits one grouped Pull Request
Review. It is optimized for cheap CI usage: one GitHub-hosted runner prepares
shared context and advisory receipts once, then model lanes reason over the
packet while local proof runs centrally.
First production preset:
bun-ub
The initial target is the Bun UB hunt. Other repo presets should be added after this one proves useful on real PRs. The Bun-specific operating handoff lives in docs/BUN_UB_HUNT.md, and real Bun packet behavior lives in examples/bun/packets/README.md. For other Rust repos, use docs/PORTING_BASELINE.md.
Four commands take a repo from curious to one required gate, each with receipts and nothing applied without your say-so:
ub-review init --profile gh-runner # starter config plus ub-review-init.md:
# file-driven setup guidance for the repo
ub-review audit-ci # read-only CI right-sizing report under
# ci-audit/: which existing jobs actually
# change merge decisions (receipts: run
# history, costs, failure correlation)
ub-review setup-ci --print-pr # render the migration PR - generated
# .ub-review.toml, gate workflow, plan -
# without writing or opening anything
ub-review setup-ci --open-pr \
--accept <job>="<command>" \
--action-sha <full-sha> # open the migration PR: one branch,
# four new files, one PR whose body is
# the plan. Never touches branch
# protection; refuses repos that already
# have a .ub-review.tomlThe migration ends with ub-review/gate as the repo's one required check:
repo-mandated proofs run inside it, tool thresholds gate on receipts, and
model lanes stay advisory until you opt in (the generated workflow ships at
the zero-key tier, model-mode: off). What blocks vs. advises, which
artifacts are stable to build automation on, and what this tool refuses to
claim are specified per surface in
the umbrella spec
(surface table) and the artifact maturity table in
SPEC-0004.
What it never claims: code correctness, UB-freedom, replacing security tooling, or model findings as proof. Missing evidence is recorded as missing evidence, never as clean evidence.
Advisory only, no required gate? The path above ends at one required check. To adopt
ub-reviewas a non-blocking advisory reviewer instead (one grouped PR review, never blocks merge), see docs/ADOPTION_ADVISORY.md — two copy-paste files and one org secret. This is the recommended starting point for downstream Rust repos (e.g.perl-lsp-swarm,ripr-swarm) before calibrating toward a blocking gate.
Most review bots do this:
PR diff -> one generic LLM -> comments
ub-review does this:
PR diff
-> targeted evidence plan
-> relevant sensors/tools/tests
-> lane-specific evidence packets
-> MiniMax M3 review lanes by default
-> proof receipts
-> validated inline comments
-> one grouped PR Review
-> full artifacts
LLM tokens are cheap. CI runner time, disk, local analyzer fanout, and reviewer attention are the constraints. This action keeps CI doing the work traditional CI would do, but chooses it like a reviewer: plan evidence from the diff, run proof that can change the decision, and keep boilerplate in artifacts.
Create .github/workflows/ub-review-packet.yml in the Bun fork:
name: UB Review Packet
on:
pull_request:
types: [opened, ready_for_review]
paths-ignore:
- "**/*.md"
- "docs/**"
permissions:
contents: read
pull-requests: write
jobs:
packet:
runs-on: ubuntu-latest
timeout-minutes: 30
steps:
- uses: actions/checkout@v6
with:
fetch-depth: 0
persist-credentials: false
- name: Fetch PR base ref
run: |
set -euo pipefail
git fetch --no-tags origin "+refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}"
- name: Build UB review packet
id: ub-review
uses: EffortlessMetrics/ub-review@804d198b5a15a0df94bb4f43750dba71165916cd
with:
preset: bun-ub
profile: gh-runner
root: .
base: origin/${{ github.base_ref }}
head: HEAD
out: target/ub-review
install-tools: 'true'
tool-bundle: core
posting: review
mode: review-byok
github-token: ${{ github.token }}
minimax-api-key: ${{ secrets.MINIMAX_API_KEY }}
minimax-provider-kind: anthropic
model-mode: auto
provider-policy: minimax-only
lane-width: '10'
model-timeout-sec: '300'
max-inline-comments: '8'
model-concurrency: '8'
max-model-calls: '14'
fail-on-post-error: 'false'
allow-heavy: 'false'
- uses: actions/upload-artifact@v7
if: always()
with:
name: ub-review-packet-${{ github.event.pull_request.number || github.run_id }}
path: target/ub-review
if-no-files-found: warn
retention-days: 7Sensor packet generation does not require secrets. Posting the grouped PR review
uses the scoped github.token. The Bun v0 workflow uses direct MiniMax M3 for
all 10 model lanes through secrets.MINIMAX_API_KEY. OpenCode Go remains an
optional direct provider for later canary/deep modes through secrets.OPENCODE,
but it is not part of the Bun v0 cutover workflow. ub-review does not shell
out to OpenCode as an agent harness. GLM is skipped for v0. Missing model keys
are recorded as missing review evidence instead of treated as a clean run.
Use a full commit SHA for the Bun gate. The current known-good Bun pin is
EffortlessMetrics/ub-review@804d198b5a15a0df94bb4f43750dba71165916cd,
validated by EffortlessSteven/bun#49 with a successful UB evidence packet,
terminal state sufficient, artifact-only PR body skip, uploaded artifact,
tokmd receipts, and verifier pass. Do not float the Bun hunt on main; update
the SHA only after this repo's verifier passes and the Bun consumer workflow
succeeds.
After downloading the first Bun artifact, verify the packet contract before tagging:
python scripts/verify-bun-review-artifacts.py target/ub-review \
--min-ok-model-lanes 10 \
--require-no-model-evidence-failuresThis check verifies the required packet tree, lane packets, review payload, post receipt, model receipts, no-LGTM invariant, and basic secret hygiene.
target/ub-review/
input/
changed-files.txt
diff.patch
diff-context.json
sensors/
tokmd/
cargo-allow/
ripr/
unsafe-review/
ast-grep/
actionlint/
*/ub-review-sensor-status.json
lanes/
ub.md
source-route.md
tests.md
arch.md
opposition.md
security.md
candidates/
candidate-0000-abc123def456.json
...
observations/
tests-oracle.ndjson
source-route.ndjson
...
proof_requests/
proof-001.json
...
questions/
tests-oracle/
red-green.json
...
orchestrator-follow-up/
follow-up-001.json
...
review/
shared_context.md
metrics.json
review.json
review.md
candidates.json
observations.json
unique_observations.json
merged_observations.json
dropped_observations.json
orchestrator_plan.json
final_orchestrator_plan.json
model_stages.json
follow_up_results.json
follow_up_outputs.json
follow_up_evidence.json
resolved_candidates.json
final_compiler_input.json
witnesses.json
witness_registry.json
proof_requests.json
proof_request_groups.json
proof_receipts.json
proof_plan.md
receipt_routes.json
resource_leases.json
resource_plan.md
github-review.json
github-review-skip.json
post-result.json
post-error.json
github-review-post-payload.json
post-stdout.json
post-stderr.txt
events.ndjson
candidates.ndjson
follow_up_questions.ndjson
follow_up_results.ndjson
follow_up_outputs.ndjson
resolved_candidates.ndjson
model_stages.ndjson
witnesses.ndjson
proof_requests.ndjson
proof_receipts.ndjson
receipt_routes.ndjson
resource_leases.ndjson
running-summary.md
Start with:
target/ub-review/running-summary.md
target/ub-review/lanes/tests.md
target/ub-review/lanes/ub.md
target/ub-review/input/diff.patch
resolved_candidates reconciles review/candidates.json with
review/follow_up_results.json and review/follow_up_outputs.json. It records
unchanged, unresolved, unavailable, resolved, or conflicting candidate state
after follow-up evidence; it is an audit receipt, not reviewer-facing text.
The bun-ub preset loads profiles/bun-ub-v0.toml as the Bun review profile.
The runtime profile (gh-runner, cx23, cx33, or cx43) supplies box
budgets separately from runtime/*.toml.
The profile creates six lane packets:
| Lane | Purpose |
|---|---|
ub |
RAB, stale pointer/length, active view vs backing store, worker handoff |
source-route |
public API route, sibling paths, PR claim truth |
tests |
red/green proof, weak oracles, ASAN/witness posture |
arch |
boundary placement, helper shape, smallest complete fix |
opposition |
strongest correctness/test/perf/portability objection |
security |
UB as exploit primitive, memory corruption, leak/DoS/security framing |
Lane identity and model identity are separate. Static packet prefixes use lane
names only; direct review mode records the provider/model separately in
review.json and review.md. The Bun v0 direct model pass uses 10 lanes through
direct MiniMax M3 with provider-policy: minimax-only. OpenCode Go canary/deep
lanes remain available later through provider-policy: minimax-primary,
opencode-go-canary, or opencode-go-wide once the provider key is proven.
Default core sensors are best-effort:
tokmdfor deterministic repository/diff packets and LLM-ready context;cargo-allowfor source-tree exception ledger drift;riprfor Rust changed-behavior test-oracle weakness;unsafe-reviewfor Rust unsafe-contract reviewability;ast-grepfor cheap structural route scans;actionlintfor workflow changes.
Missing sensors are recorded as missing evidence. Missing evidence is never reported as clean evidence.
Heavy witnesses such as builds, tests, Miri, ASAN, and mutation testing are off by default. Enable them only behind explicit workflow policy.
Custom configs can mark a tool as required. The requirement applies only when the tool's trigger matches the current diff, so required workflow tools do not create evidence gaps on source-only PRs.
[tools.actionlint]
required = trueFor Rust repositories with an unsafe surface, unsafe-review is the third
static evidence pillar beside cargo-allow and ripr:
| Tool | Review question |
|---|---|
cargo-allow |
Is this exception owned, scoped, evidenced, and not silently broadened? |
ripr |
Does changed behavior appear exposed to a meaningful oracle? |
unsafe-review |
Does changed unsafe code have reviewable safety evidence? |
cargo-mutants |
Do tests fail against concrete mutants? |
| Miri | Does this concrete execution hit UB? |
| Codecov | Did this code execute? |
unsafe-review asks whether an unsafe change has the safety contract,
precondition guard, layout/alignment witness, aliasing/lifetime evidence, local
test reach, and witness route needed for credible review. It is advisory by
default and does not claim to prove soundness, UB-free status, or Miri
cleanliness. See docs/UNSAFE_REVIEW_POLICY.md
and docs/ci/unsafe-review.md for reusable repo
guidance.
ub-review run prepares evidence and review artifacts. ub-review post submits
review/github-review.json as one GitHub Pull Request Review:
ub-review run --posting review --out target/ub-review
ub-review post --review-json target/ub-review/review/github-review.jsonub-review gate-check enforces a previously recorded gate verdict with the
same fail-on-gate resolution run uses (auto enforces only for
--mode intelligent-ci). The GitHub action's final Enforce gate outcome
step calls it instead of re-implementing that logic in bash:
ub-review gate-check \
--gate-outcome target/ub-review/review/gate_outcome.json \
--fail-on-gate auto \
--mode intelligent-ciInline comments are only emitted when they pass the diff-line guardrails:
repo-relative path, valid RIGHT side line from the PR diff, actionable
severity, high or medium-high confidence, concise body, lane prefix, and
evidence or a disproof condition. Other candidates stay in review.md under
summary-only findings.
The intended cheap path is:
1 runner job
checkout
build packet
run cheap sensors once
upload artifact
Do not run many independent review jobs that rediscover the repository. This action builds shared context once, runs bounded model lanes over that context, validates inline candidates, and submits one grouped PR review when configured.
| Input | Default | Meaning |
|---|---|---|
preset |
bun-ub |
Repo preset. |
config |
empty | Optional repo-local or absolute TOML config path; overrides preset when set. |
profile |
gh-runner |
Box profile. |
base |
origin/main |
Base ref. |
head |
HEAD |
Head ref. |
out |
target/ub-review |
Packet output directory. |
tool-bundle |
core |
none, core, bun-fast, or full. |
install-tools |
true |
Best-effort sensor install. |
setup-rust |
true |
Select Rust 1.95 with rustup when available. |
install-mode |
auto |
auto, release, source, or path. |
binary-path |
empty | Existing binary path for install-mode=path. |
release-version |
empty | Release tag for release downloads; empty lets tagged action refs provide the tag. |
release-asset |
ub-review-x86_64-unknown-linux-gnu.tar.gz |
Linux x64 release archive asset. |
allow-heavy |
false |
Permit heavy witness classes. |
posting |
review |
review posts one Pull Request Review; artifact-only only writes files. |
mode |
review-byok |
BYOK grouped review mode. intelligent-ci selects the required-gate product mode; legacy review-direct is accepted as an alias. |
github-token |
empty | Scoped token for posting=review. |
minimax-api-key |
empty | MiniMax M3 lane key. |
minimax-api-url |
empty | Optional MiniMax API URL override. |
minimax-provider-kind |
anthropic |
MiniMax envelope, anthropic or openai. |
minimax-model |
MiniMax-M3 |
MiniMax model name. |
opencode-api-key |
empty | OpenCode Go key for optional direct provider lanes. |
opencode-api-url |
empty | Optional OpenCode Go API URL override. |
opencode-model |
minimax-m3 |
OpenCode Go canary model. |
opencode-endpoint-kind |
auto |
auto, openai-chat, or anthropic-messages. |
model-mode |
auto |
auto or off. |
provider-policy |
minimax-primary |
minimax-primary, minimax-only, opencode-go-canary, opencode-go-wide, or auto. |
lane-width |
10 |
Bun model lane width: 6, 10, or 20. |
model-timeout-sec |
300 |
Per-model-call timeout. |
max-inline-comments |
8 |
Upper bound for validated inline comments. |
model-concurrency |
8 |
Planned model lane concurrency. |
max-model-calls |
14 |
Upper bound for model review calls. |
review-body-max-bytes |
60000 |
Maximum grouped review body size. |
ledger-path |
empty | Optional read-only UB ledger path. |
ledger-max-bytes |
65536 |
Maximum ledger context bytes. |
fail-on-post-error |
false |
Fail the action when PR review posting fails. |
fail-on-gate |
auto |
Gate enforcement: auto, true, or false. The action's final Enforce gate outcome step runs ub-review gate-check, which fails the check when review/gate_outcome.json records a fail conclusion and enforcement resolves to true; artifacts, the job summary, and PR review posting always complete first. auto resolves to true for mode=intelligent-ci and false otherwise. |
github-summary |
true |
Append running summary to job summary. |
Custom configs can require proof in intelligent-ci mode. Matched requests are
still routed through the central proof broker allowlist and runtime budget.
review_profile = "bun-ub-v0"
profile = "gh-runner"
[repo]
kind = "rust"
[[proof.required]]
id = "cargo-check"
languages = ["rust"]
diff_classes = ["source-general", "source-ub"]
command = "cargo check --workspace --locked"
reason = "Required Rust workspace check for intelligent CI."
cost = "focused-build"
timeout_sec = 300
required = true| Output | Meaning |
|---|---|
out |
Output directory containing the full packet. |
summary-path |
running-summary.md. |
events-path |
Append-only events.ndjson. |
review-json-path |
Internal review/review.json. |
metrics-json-path |
Review metrics artifact. |
github-review-path |
Prepared grouped review payload. |
post-result-path |
Successful grouped review post receipt. |
post-error-path |
Grouped review post error receipt. |
post-payload-path |
Exact grouped review payload submitted to GitHub. |
post-stdout-path |
GitHub post response body artifact. |
post-stderr-path |
GitHub post stderr artifact. |
gate-outcome-path |
Deterministic gate verdict review/gate_outcome.json. |
With install-mode=auto, tagged action refs first try the Linux x64 release
archive and fall back to a source build when the asset is unavailable. Commit
SHA refs use the source build path. This keeps first adoption token-free and
mechanically simple while leaving the faster release-binary path available for
tagged rollouts. Explicit install-mode=release is strict: missing archives,
missing checksum receipts, checksum mismatches, and unsupported runners fail
instead of rebuilding from source. Use auto when fallback is acceptable. The
consuming workflow can cache Cargo registry and target directories if needed.
Codex work should follow docs/CODEX_FINISH.md: one
small green PR at a time, MiniMax M3 primary for v0, GLM skipped until
approved, agent harnesses out of the hot path, and real sensor defects filed in
the matching *-swarm repo instead of silently absorbed into ub-review.
Track the next steps in docs/ROADMAP.md. The roadmap records the v0 Bun smoke proof, cleanup work, PR body cleanup, profile extraction path, and the planned resource-aware orchestrator with proof and resource brokers. The PR-commentary standard is docs/REVIEW_BODY_CONTRACT.md: use the runner for evidence, and use the PR body only for decision-changing signal.
Use docs/calibration/bun-ub-review-ledger.md to record acted-on findings, false premises, parked follow-ups, and review compiler tuning notes from real Bun fork runs.
cargo generate-lockfile
cargo fmt --all --check
cargo check --workspace --all-targets --locked
cargo test --workspace --all-targets --locked
cargo clippy --workspace --all-targets --locked -- -D warnings
cargo doc --workspace --no-deps --locked- Rust 2024
- Rust 1.95 MSRV
unsafe_code = forbid- efficient CI gates
- advisory by default
- one grouped PR Review when posting is configured
- no issue-comment spam or standalone lane posts