examples: lm-eval bridge as flagship reference integration (+ in-toto attest beat)

Cuneyt Ozturk · claude · Cuneyt Ozturk · commit 0e6c30c77672 · 2026-06-17T20:48:18.000+03:00
Make the lm-evaluation-harness &lt;-&gt; PRML bridge the host-facing reference integration:
- add an [ATTEST] beat emitting the locked claim as an in-toto (ITE-6) Statement
  (falsify &gt;=0.3.8) — the embed hook a host that ingests SLSA/in-toto can drop in.
- README: frame it as the flagship reference integration, note it uses the REAL
  falsify_prml reference (no parallel schema), link docs/EMBED.md + lm-eval PR #3752.

This is the faithful, runnable proof for the embed bet — already on real PRML v0.1,
unlike the inspect/giskard adapters (which are on the obsolete schema; migration queued).

Co-Authored-By: Claude Opus 4.8 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/patterns/examples/README.md b/patterns/examples/README.md
@@ -25,5 +25,13 @@ python3 lm_eval_to_prml.py --mode lock --results results.json \
         --task hellaswag --metric acc_norm --threshold 0.75  # real run
 ```
 
-Real: PRML canonicalisation/hashing/verdicts (`falsify_prml`). Sample-modelled: a faithful
-in-file lm-eval results dict so it runs with no lm-eval install; `--results` accepts a real one.
+Real: PRML canonicalisation/hashing/verdicts (`falsify_prml` — the same reference the spec and the
+4 byte-equivalent impls use, no parallel schema). Sample-modelled: a faithful in-file lm-eval results
+dict so it runs with no lm-eval install; `--results` accepts a real one.
+
+**This is the flagship reference integration** — what it looks like to drop a pre-committed eval claim
+into a tool a platform already uses (lm-evaluation-harness, the most-used LLM harness). The demo's
+`[ATTEST]` step emits the locked claim as an **in-toto / ITE-6 Statement** (`falsify >= 0.3.8`), so a
+host that already ingests SLSA/in-toto can treat a pre-registered eval as one more predicate type — the
+3-function embed path is in [`docs/EMBED.md`](https://github.com/studio-11-co/falsify/blob/main/docs/EMBED.md).
+Related upstream thread: [lm-evaluation-harness PR #3752](https://github.com/EleutherAI/lm-evaluation-harness/pull/3752).
diff --git a/patterns/examples/lm_eval_to_prml.py b/patterns/examples/lm_eval_to_prml.py
@@ -181,6 +181,18 @@ def main():
     print("\n[VERIFY] after the run — check the result against the sealed bar")
     cmd_verify(lock, args)
 
+    # The embed hook: hand a host an in-toto (ITE-6) attestation of the locked claim,
+    # so a platform that already ingests SLSA/in-toto can treat the pre-registered eval
+    # as one more predicate type. (falsify >= 0.3.8)
+    if hasattr(prml, "to_intoto_statement"):
+        print("\n[ATTEST] emit the locked claim as an in-toto (ITE-6) Statement")
+        stmt = prml.to_intoto_statement(lock["manifest"])
+        print(f"  _type         : {stmt['_type']}")
+        print(f"  predicateType : {stmt['predicateType']}")
+        print(f"  subject[0]    : {stmt['subject'][0]['name']}  "
+              f"sha256={stmt['subject'][0]['digest']['sha256'][:16]}…  (== the PRML lock)")
+        print("  -> drop this into your evidence bundle / transparency log. See docs/EMBED.md.")
+
     print("\n[ADVERSARIAL] someone lowers the threshold 0.75 -> 0.78 post-hoc")
     moved = json.loads(json.dumps(lock))
     moved["manifest"]["threshold"] = 0.78  # edit the manifest, keep the old locked hash