Skip to content

EffortlessMetrics/ub-review

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

577 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ub-review

Targeted CI runner with review judgment automation for UB/native-boundary PR review.

ub-review is an intelligent PR CI gate: it decides what evidence a PR needs, runs the relevant proof, and turns the result into one concise review decision. It is not another generic PR-commenting bot and it is not fixed-job CI. It builds deterministic evidence packets, runs bounded BYOK model lanes when configured, validates inline comments, and submits one grouped Pull Request Review. It is optimized for cheap CI usage: one GitHub-hosted runner prepares shared context and advisory receipts once, then model lanes reason over the packet while local proof runs centrally.

First production preset:

bun-ub

The initial target is the Bun UB hunt. Other repo presets should be added after this one proves useful on real PRs. The Bun-specific operating handoff lives in docs/BUN_UB_HUNT.md, and real Bun packet behavior lives in examples/bun/packets/README.md. For other Rust repos, use docs/PORTING_BASELINE.md.

Start here: the adoption path

Four commands take a repo from curious to one required gate, each with receipts and nothing applied without your say-so:

ub-review init --profile gh-runner  # starter config plus ub-review-init.md:
                                    # file-driven setup guidance for the repo

ub-review audit-ci                  # read-only CI right-sizing report under
                                    # ci-audit/: which existing jobs actually
                                    # change merge decisions (receipts: run
                                    # history, costs, failure correlation)

ub-review setup-ci --print-pr       # render the migration PR - generated
                                    # .ub-review.toml, gate workflow, plan -
                                    # without writing or opening anything

ub-review setup-ci --open-pr \
  --accept <job>="<command>" \
  --action-sha <full-sha>           # open the migration PR: one branch,
                                    # four new files, one PR whose body is
                                    # the plan. Never touches branch
                                    # protection; refuses repos that already
                                    # have a .ub-review.toml

The migration ends with ub-review/gate as the repo's one required check: repo-mandated proofs run inside it, tool thresholds gate on receipts, and model lanes stay advisory until you opt in (the generated workflow ships at the zero-key tier, model-mode: off). What blocks vs. advises, which artifacts are stable to build automation on, and what this tool refuses to claim are specified per surface in the umbrella spec (surface table) and the artifact maturity table in SPEC-0004.

What it never claims: code correctness, UB-freedom, replacing security tooling, or model findings as proof. Missing evidence is recorded as missing evidence, never as clean evidence.

Advisory only, no required gate? The path above ends at one required check. To adopt ub-review as a non-blocking advisory reviewer instead (one grouped PR review, never blocks merge), see docs/ADOPTION_ADVISORY.md — two copy-paste files and one org secret. This is the recommended starting point for downstream Rust repos (e.g. perl-lsp-swarm, ripr-swarm) before calibrating toward a blocking gate.

Why this exists

Most review bots do this:

PR diff -> one generic LLM -> comments

ub-review does this:

PR diff
  -> targeted evidence plan
  -> relevant sensors/tools/tests
  -> lane-specific evidence packets
  -> MiniMax M3 review lanes by default
  -> proof receipts
  -> validated inline comments
  -> one grouped PR Review
  -> full artifacts

LLM tokens are cheap. CI runner time, disk, local analyzer fanout, and reviewer attention are the constraints. This action keeps CI doing the work traditional CI would do, but chooses it like a reviewer: plan evidence from the diff, run proof that can change the decision, and keep boilerplate in artifacts.

Copy/paste Bun setup

Create .github/workflows/ub-review-packet.yml in the Bun fork:

name: UB Review Packet

on:
  pull_request:
    types: [opened, ready_for_review]
    paths-ignore:
      - "**/*.md"
      - "docs/**"

permissions:
  contents: read
  pull-requests: write

jobs:
  packet:
    runs-on: ubuntu-latest
    timeout-minutes: 30

    steps:
      - uses: actions/checkout@v6
        with:
          fetch-depth: 0
          persist-credentials: false

      - name: Fetch PR base ref
        run: |
          set -euo pipefail
          git fetch --no-tags origin "+refs/heads/${{ github.base_ref }}:refs/remotes/origin/${{ github.base_ref }}"

      - name: Build UB review packet
        id: ub-review
        uses: EffortlessMetrics/ub-review@804d198b5a15a0df94bb4f43750dba71165916cd
        with:
          preset: bun-ub
          profile: gh-runner
          root: .
          base: origin/${{ github.base_ref }}
          head: HEAD
          out: target/ub-review
          install-tools: 'true'
          tool-bundle: core
          posting: review
          mode: review-byok
          github-token: ${{ github.token }}
          minimax-api-key: ${{ secrets.MINIMAX_API_KEY }}
          minimax-provider-kind: anthropic
          model-mode: auto
          provider-policy: minimax-only
          lane-width: '10'
          model-timeout-sec: '300'
          max-inline-comments: '8'
          model-concurrency: '8'
          max-model-calls: '14'
          fail-on-post-error: 'false'
          allow-heavy: 'false'

      - uses: actions/upload-artifact@v7
        if: always()
        with:
          name: ub-review-packet-${{ github.event.pull_request.number || github.run_id }}
          path: target/ub-review
          if-no-files-found: warn
          retention-days: 7

Sensor packet generation does not require secrets. Posting the grouped PR review uses the scoped github.token. The Bun v0 workflow uses direct MiniMax M3 for all 10 model lanes through secrets.MINIMAX_API_KEY. OpenCode Go remains an optional direct provider for later canary/deep modes through secrets.OPENCODE, but it is not part of the Bun v0 cutover workflow. ub-review does not shell out to OpenCode as an agent harness. GLM is skipped for v0. Missing model keys are recorded as missing review evidence instead of treated as a clean run.

Use a full commit SHA for the Bun gate. The current known-good Bun pin is EffortlessMetrics/ub-review@804d198b5a15a0df94bb4f43750dba71165916cd, validated by EffortlessSteven/bun#49 with a successful UB evidence packet, terminal state sufficient, artifact-only PR body skip, uploaded artifact, tokmd receipts, and verifier pass. Do not float the Bun hunt on main; update the SHA only after this repo's verifier passes and the Bun consumer workflow succeeds.

After downloading the first Bun artifact, verify the packet contract before tagging:

python scripts/verify-bun-review-artifacts.py target/ub-review \
  --min-ok-model-lanes 10 \
  --require-no-model-evidence-failures

This check verifies the required packet tree, lane packets, review payload, post receipt, model receipts, no-LGTM invariant, and basic secret hygiene.

What it writes

target/ub-review/
  input/
    changed-files.txt
    diff.patch
    diff-context.json

  sensors/
    tokmd/
    cargo-allow/
    ripr/
    unsafe-review/
    ast-grep/
    actionlint/
    */ub-review-sensor-status.json

  lanes/
    ub.md
    source-route.md
    tests.md
    arch.md
    opposition.md
    security.md

  candidates/
    candidate-0000-abc123def456.json
    ...

  observations/
    tests-oracle.ndjson
    source-route.ndjson
    ...

  proof_requests/
    proof-001.json
    ...

  questions/
    tests-oracle/
      red-green.json
      ...
    orchestrator-follow-up/
      follow-up-001.json
      ...

  review/
    shared_context.md
    metrics.json
    review.json
    review.md
    candidates.json
    observations.json
    unique_observations.json
    merged_observations.json
    dropped_observations.json
    orchestrator_plan.json
    final_orchestrator_plan.json
    model_stages.json
    follow_up_results.json
    follow_up_outputs.json
    follow_up_evidence.json
    resolved_candidates.json
    final_compiler_input.json
    witnesses.json
    witness_registry.json
    proof_requests.json
    proof_request_groups.json
    proof_receipts.json
    proof_plan.md
    receipt_routes.json
    resource_leases.json
    resource_plan.md
    github-review.json
    github-review-skip.json
    post-result.json
    post-error.json
    github-review-post-payload.json
    post-stdout.json
    post-stderr.txt

  events.ndjson
  candidates.ndjson
  follow_up_questions.ndjson
  follow_up_results.ndjson
  follow_up_outputs.ndjson
  resolved_candidates.ndjson
  model_stages.ndjson
  witnesses.ndjson
  proof_requests.ndjson
  proof_receipts.ndjson
  receipt_routes.ndjson
  resource_leases.ndjson
  running-summary.md

Start with:

target/ub-review/running-summary.md
target/ub-review/lanes/tests.md
target/ub-review/lanes/ub.md
target/ub-review/input/diff.patch

resolved_candidates reconciles review/candidates.json with review/follow_up_results.json and review/follow_up_outputs.json. It records unchanged, unresolved, unavailable, resolved, or conflicting candidate state after follow-up evidence; it is an audit receipt, not reviewer-facing text.

Bun preset

The bun-ub preset loads profiles/bun-ub-v0.toml as the Bun review profile. The runtime profile (gh-runner, cx23, cx33, or cx43) supplies box budgets separately from runtime/*.toml.

The profile creates six lane packets:

Lane Purpose
ub RAB, stale pointer/length, active view vs backing store, worker handoff
source-route public API route, sibling paths, PR claim truth
tests red/green proof, weak oracles, ASAN/witness posture
arch boundary placement, helper shape, smallest complete fix
opposition strongest correctness/test/perf/portability objection
security UB as exploit primitive, memory corruption, leak/DoS/security framing

Lane identity and model identity are separate. Static packet prefixes use lane names only; direct review mode records the provider/model separately in review.json and review.md. The Bun v0 direct model pass uses 10 lanes through direct MiniMax M3 with provider-policy: minimax-only. OpenCode Go canary/deep lanes remain available later through provider-policy: minimax-primary, opencode-go-canary, or opencode-go-wide once the provider key is proven.

Sensors

Default core sensors are best-effort:

  • tokmd for deterministic repository/diff packets and LLM-ready context;
  • cargo-allow for source-tree exception ledger drift;
  • ripr for Rust changed-behavior test-oracle weakness;
  • unsafe-review for Rust unsafe-contract reviewability;
  • ast-grep for cheap structural route scans;
  • actionlint for workflow changes.

Missing sensors are recorded as missing evidence. Missing evidence is never reported as clean evidence.

Heavy witnesses such as builds, tests, Miri, ASAN, and mutation testing are off by default. Enable them only behind explicit workflow policy.

Custom configs can mark a tool as required. The requirement applies only when the tool's trigger matches the current diff, so required workflow tools do not create evidence gaps on source-only PRs.

[tools.actionlint]
required = true

Rust unsafe evidence stack

For Rust repositories with an unsafe surface, unsafe-review is the third static evidence pillar beside cargo-allow and ripr:

Tool Review question
cargo-allow Is this exception owned, scoped, evidenced, and not silently broadened?
ripr Does changed behavior appear exposed to a meaningful oracle?
unsafe-review Does changed unsafe code have reviewable safety evidence?
cargo-mutants Do tests fail against concrete mutants?
Miri Does this concrete execution hit UB?
Codecov Did this code execute?

unsafe-review asks whether an unsafe change has the safety contract, precondition guard, layout/alignment witness, aliasing/lifetime evidence, local test reach, and witness route needed for credible review. It is advisory by default and does not claim to prove soundness, UB-free status, or Miri cleanliness. See docs/UNSAFE_REVIEW_POLICY.md and docs/ci/unsafe-review.md for reusable repo guidance.

Review posting

ub-review run prepares evidence and review artifacts. ub-review post submits review/github-review.json as one GitHub Pull Request Review:

ub-review run --posting review --out target/ub-review
ub-review post --review-json target/ub-review/review/github-review.json

ub-review gate-check enforces a previously recorded gate verdict with the same fail-on-gate resolution run uses (auto enforces only for --mode intelligent-ci). The GitHub action's final Enforce gate outcome step calls it instead of re-implementing that logic in bash:

ub-review gate-check \
  --gate-outcome target/ub-review/review/gate_outcome.json \
  --fail-on-gate auto \
  --mode intelligent-ci

Inline comments are only emitted when they pass the diff-line guardrails: repo-relative path, valid RIGHT side line from the PR diff, actionable severity, high or medium-high confidence, concise body, lane prefix, and evidence or a disproof condition. Other candidates stay in review.md under summary-only findings.

Efficient CI stance

The intended cheap path is:

1 runner job
  checkout
  build packet
  run cheap sensors once
  upload artifact

Do not run many independent review jobs that rediscover the repository. This action builds shared context once, runs bounded model lanes over that context, validates inline candidates, and submits one grouped PR review when configured.

Inputs

Input Default Meaning
preset bun-ub Repo preset.
config empty Optional repo-local or absolute TOML config path; overrides preset when set.
profile gh-runner Box profile.
base origin/main Base ref.
head HEAD Head ref.
out target/ub-review Packet output directory.
tool-bundle core none, core, bun-fast, or full.
install-tools true Best-effort sensor install.
setup-rust true Select Rust 1.95 with rustup when available.
install-mode auto auto, release, source, or path.
binary-path empty Existing binary path for install-mode=path.
release-version empty Release tag for release downloads; empty lets tagged action refs provide the tag.
release-asset ub-review-x86_64-unknown-linux-gnu.tar.gz Linux x64 release archive asset.
allow-heavy false Permit heavy witness classes.
posting review review posts one Pull Request Review; artifact-only only writes files.
mode review-byok BYOK grouped review mode. intelligent-ci selects the required-gate product mode; legacy review-direct is accepted as an alias.
github-token empty Scoped token for posting=review.
minimax-api-key empty MiniMax M3 lane key.
minimax-api-url empty Optional MiniMax API URL override.
minimax-provider-kind anthropic MiniMax envelope, anthropic or openai.
minimax-model MiniMax-M3 MiniMax model name.
opencode-api-key empty OpenCode Go key for optional direct provider lanes.
opencode-api-url empty Optional OpenCode Go API URL override.
opencode-model minimax-m3 OpenCode Go canary model.
opencode-endpoint-kind auto auto, openai-chat, or anthropic-messages.
model-mode auto auto or off.
provider-policy minimax-primary minimax-primary, minimax-only, opencode-go-canary, opencode-go-wide, or auto.
lane-width 10 Bun model lane width: 6, 10, or 20.
model-timeout-sec 300 Per-model-call timeout.
max-inline-comments 8 Upper bound for validated inline comments.
model-concurrency 8 Planned model lane concurrency.
max-model-calls 14 Upper bound for model review calls.
review-body-max-bytes 60000 Maximum grouped review body size.
ledger-path empty Optional read-only UB ledger path.
ledger-max-bytes 65536 Maximum ledger context bytes.
fail-on-post-error false Fail the action when PR review posting fails.
fail-on-gate auto Gate enforcement: auto, true, or false. The action's final Enforce gate outcome step runs ub-review gate-check, which fails the check when review/gate_outcome.json records a fail conclusion and enforcement resolves to true; artifacts, the job summary, and PR review posting always complete first. auto resolves to true for mode=intelligent-ci and false otherwise.
github-summary true Append running summary to job summary.

Repo Config Proof Policy

Custom configs can require proof in intelligent-ci mode. Matched requests are still routed through the central proof broker allowlist and runtime budget.

review_profile = "bun-ub-v0"
profile = "gh-runner"

[repo]
kind = "rust"

[[proof.required]]
id = "cargo-check"
languages = ["rust"]
diff_classes = ["source-general", "source-ub"]
command = "cargo check --workspace --locked"
reason = "Required Rust workspace check for intelligent CI."
cost = "focused-build"
timeout_sec = 300
required = true

Outputs

Output Meaning
out Output directory containing the full packet.
summary-path running-summary.md.
events-path Append-only events.ndjson.
review-json-path Internal review/review.json.
metrics-json-path Review metrics artifact.
github-review-path Prepared grouped review payload.
post-result-path Successful grouped review post receipt.
post-error-path Grouped review post error receipt.
post-payload-path Exact grouped review payload submitted to GitHub.
post-stdout-path GitHub post response body artifact.
post-stderr-path GitHub post stderr artifact.
gate-outcome-path Deterministic gate verdict review/gate_outcome.json.

Bootstrap note

With install-mode=auto, tagged action refs first try the Linux x64 release archive and fall back to a source build when the asset is unavailable. Commit SHA refs use the source build path. This keeps first adoption token-free and mechanically simple while leaving the faster release-binary path available for tagged rollouts. Explicit install-mode=release is strict: missing archives, missing checksum receipts, checksum mismatches, and unsupported runners fail instead of rebuilding from source. Use auto when fallback is acceptable. The consuming workflow can cache Cargo registry and target directories if needed.

Codex lane notes

Codex work should follow docs/CODEX_FINISH.md: one small green PR at a time, MiniMax M3 primary for v0, GLM skipped until approved, agent harnesses out of the hot path, and real sensor defects filed in the matching *-swarm repo instead of silently absorbed into ub-review.

Roadmap and calibration

Track the next steps in docs/ROADMAP.md. The roadmap records the v0 Bun smoke proof, cleanup work, PR body cleanup, profile extraction path, and the planned resource-aware orchestrator with proof and resource brokers. The PR-commentary standard is docs/REVIEW_BODY_CONTRACT.md: use the runner for evidence, and use the PR body only for decision-changing signal.

Use docs/calibration/bun-ub-review-ledger.md to record acted-on findings, false premises, parked follow-ups, and review compiler tuning notes from real Bun fork runs.

Local development

cargo generate-lockfile
cargo fmt --all --check
cargo check --workspace --all-targets --locked
cargo test --workspace --all-targets --locked
cargo clippy --workspace --all-targets --locked -- -D warnings
cargo doc --workspace --no-deps --locked

Rust style

  • Rust 2024
  • Rust 1.95 MSRV
  • unsafe_code = forbid
  • efficient CI gates
  • advisory by default
  • one grouped PR Review when posting is configured
  • no issue-comment spam or standalone lane posts

About

Intelligent CI Runner

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages