Manual or scripted checks that mirror the comprehensive test plan. Automated coverage lives under pipeline/tests/test_trust_hardening_*.py, test_llm_proposals.py, test_llm_lean_proposals.py, and test_benchmark_runner.py.
- Create a paper with
metadata.jsonand noclaims.json(orclaims.jsonas[]). - Run extract-claims with
--mode scaffold_only(or default). - Expect a non-empty
claims.jsonwith the placeholder claim,extraction_run.jsoncontainingextraction_modeandplaceholder_claim_written: true.
- Same paper layout as A but run extract-claims with
--mode deterministic(orllm_sidecar) and no pre-existing claims file. - Expect no placeholder claim: after normalization,
claims.jsonis an empty array[], andplaceholder_claim_writtenis false inextraction_run.json.
- In
claims.json, setlinked_assumptions/linked_symbolsto mix valid IDs and unknown IDs (or use_unresolvedfields). - Run
normalize-paper(or extract-claims, which runs normalize). - Expect valid IDs on
linked_assumptions/linked_symbols, unknown IDs onlinked_assumptions_unresolved/linked_symbols_unresolved, and matching fields innormalization_report.json.
- Add an invalid
llm_claim_proposals.json,llm_lean_proposals.json, orsuggested_assumptions.jsonunder a paper directory. - Run
validate-allor the gate runner. - Expect stderr warnings for suggestion sidecars; pipeline must not fail solely on invalid optional sidecars.
- Run
llm-lean-proposalswith--use-fake-provideror a live model; expectllm_lean_proposals.jsononly (noformal/writes). - Review JSON; run
llm-lean-proposals-to-apply-bundlefor oneproposal_idwith non-emptyreplacements. - Run
proof-repair-applywithout--apply, then only apply on a branch after human sign-off. See llm-lean-live-test-matrix.md.
- Edit
manifest.jsonto set a fakedependency_graphandkernel_indexthat do not match the corpus. - Run
publish-artifactsfor that paper withoutSM_PUBLISH_REUSE_MANIFEST_GRAPHS. - Expect
dependency_graphandkernel_indexrecomputed from current theorem cards andcorpus/kernels.json. - Repeat publish twice with no source edits; expect stable
build_hashandbuild_hash_version2(seetest_double_publish_identical_build_hash). - With
SM_PUBLISH_REUSE_MANIFEST_GRAPHS=1, expect prior manifest graph fields to be retained when still present.
uv run --project pipeline ruff check pipeline/src/sm_pipelineuv run --project pipeline python -m sm_pipeline.cli validate-alluv run --project pipeline pytest pipeline/tests -quv run --project pipeline python -m sm_pipeline.cli benchmarkuv run python scripts/generate_repo_snapshot.py
See trust-boundary-and-extraction.md and ADR 0012.