End-to-end scenarios: trust hardening and LLM sidecars

Manual or scripted checks that mirror the comprehensive test plan. Automated coverage lives under pipeline/tests/test_trust_hardening_*.py, test_llm_proposals.py, test_llm_lean_proposals.py, and test_benchmark_runner.py.

Scenario A: Scaffold-only extraction with empty claims

Create a paper with metadata.json and no claims.json (or claims.json as []).
Run extract-claims with --mode scaffold_only (or default).
Expect a non-empty claims.json with the placeholder claim, extraction_run.json containing extraction_mode and placeholder_claim_written: true.

Scenario B: Deterministic mode without scaffolding

Same paper layout as A but run extract-claims with --mode deterministic (or llm_sidecar) and no pre-existing claims file.
Expect no placeholder claim: after normalization, claims.json is an empty array [], and placeholder_claim_written is false in extraction_run.json.

Scenario C: Unresolved links after normalization

In claims.json, set linked_assumptions / linked_symbols to mix valid IDs and unknown IDs (or use _unresolved fields).
Run normalize-paper (or extract-claims, which runs normalize).
Expect valid IDs on linked_assumptions / linked_symbols, unknown IDs on linked_assumptions_unresolved / linked_symbols_unresolved, and matching fields in normalization_report.json.

Scenario D: Suggestion sidecars (warn-only)

Add an invalid llm_claim_proposals.json, llm_lean_proposals.json, or suggested_assumptions.json under a paper directory.
Run validate-all or the gate runner.
Expect stderr warnings for suggestion sidecars; pipeline must not fail solely on invalid optional sidecars.

Scenario F: LLM Lean assist (suggest-only, human-gated apply)

Run llm-lean-proposals with --use-fake-provider or a live model; expect llm_lean_proposals.json only (no formal/ writes).
Review JSON; run llm-lean-proposals-to-apply-bundle for one proposal_id with non-empty replacements.
Run proof-repair-apply without --apply, then only apply on a branch after human sign-off. See llm-lean-live-test-matrix.md.

Scenario E: Publish recomputes manifest graphs

Edit manifest.json to set a fake dependency_graph and kernel_index that do not match the corpus.
Run publish-artifacts for that paper without SM_PUBLISH_REUSE_MANIFEST_GRAPHS.
Expect dependency_graph and kernel_index recomputed from current theorem cards and corpus/kernels.json.
Repeat publish twice with no source edits; expect stable build_hash and build_hash_version 2 (see test_double_publish_identical_build_hash).
With SM_PUBLISH_REUSE_MANIFEST_GRAPHS=1, expect prior manifest graph fields to be retained when still present.

Related commands

uv run --project pipeline ruff check pipeline/src/sm_pipeline
uv run --project pipeline python -m sm_pipeline.cli validate-all
uv run --project pipeline pytest pipeline/tests -q
uv run --project pipeline python -m sm_pipeline.cli benchmark
uv run python scripts/generate_repo_snapshot.py

See trust-boundary-and-extraction.md and ADR 0012.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-to-end scenarios: trust hardening and LLM sidecars

Scenario A: Scaffold-only extraction with empty claims

Scenario B: Deterministic mode without scaffolding

Scenario C: Unresolved links after normalization

Scenario D: Suggestion sidecars (warn-only)

Scenario F: LLM Lean assist (suggest-only, human-gated apply)

Scenario E: Publish recomputes manifest graphs

Related commands

FilesExpand file tree

trust-hardening-e2e-scenarios.md

Latest commit

History

trust-hardening-e2e-scenarios.md

File metadata and controls

End-to-end scenarios: trust hardening and LLM sidecars

Scenario A: Scaffold-only extraction with empty claims

Scenario B: Deterministic mode without scaffolding

Scenario C: Unresolved links after normalization

Scenario D: Suggestion sidecars (warn-only)

Scenario F: LLM Lean assist (suggest-only, human-gated apply)

Scenario E: Publish recomputes manifest graphs

Related commands