PRML Cookbook

Short, opinionated patterns for using PRML in real ML evaluation pipelines.

This is the field-manual for the PRML specification. The spec tells you what a manifest is. The cookbook tells you how to use it without shooting yourself in the foot.

Every pattern is:

One page — read in under three minutes
Self-contained — the example runs end-to-end with the snippets shown
Failure-mode-first — what goes wrong is named before what goes right

Patterns

#	Pattern	When to use
1	Single-shot eval claim	One model, one benchmark, one number — the 90% case.
2	Multi-seed eval claim	When you report mean ± std over N seeds.
3	Streaming Elo / arena eval	Live leaderboards. (Uses v0.2 streaming variant.)
4	Dataset version pinning	Benchmarks evolve; how to commit to a specific revision.
5	CI gate via prml-verify-action	Block PRs that ship a model with a tampered eval claim.
6	Public registry anchoring	When and when not to publish your hash publicly.
7	Revocation	Withdrawing a manifest after publication. (v0.2 feature.)
8	Pre-registration without infrastructure	The minimum-viable workflow: a YAML file and `sha256sum`.
9	RLHF win-rate evaluations	Judge-model comparisons (AlpacaEval, MT-Bench, Arena-Hard).
10	Federated evaluation	Multi-org replication: shared hash, distinct producers, regulator-grade audit trail.
11	PRML + Sigstore for execution integrity	Closes the §8.1 gap: who ran the eval, when, against which exact artefacts.
12	PRML in Hugging Face model cards	Make the accuracy number on a published HF model card verifiable, not trust-me prose.
13	PRML + commit-reveal validation for independence attestation ▶ runnable	Closes the other §8.1 gap: structural proof that independent evaluators couldn't coordinate verdicts. Co-authored with ValiChord.

Anti-patterns

#	Anti-pattern	Why it bites
A1	Computing the hash after the run	The whole point is committing before.
A2	Editing the manifest "to fix a typo"	Any edit breaks the hash. Use revocation.
A3	Storing private data in the manifest	The hash is published; the manifest content might be too.
A4	Treating the hash as proof of truth	The hash proves commitment, not correctness.

Reference

Identity levels (0–4) — a non-normative ladder for the binding strength between producer and the real-world authoring entity. Used by Pattern 11 and the v0.3 RFC.

Audit & compliance crosswalks

Subcategory-by-subcategory maps from major AI governance frameworks to PRML fields (FULL / PARTIAL / NONE tagged):

EU AI Act Article 12 — code-level pattern for the 2 December 2027 high-risk deadline
NIST AI RMF 1.0 — GOVERN / MAP / MEASURE / MANAGE subcategory map
ISO/IEC 42001:2023 — AI Management System clause-by-clause evidence map

Examples

Working code in examples/:

pytorch-imagenet/ — Full example: PRML manifest before a PyTorch ImageNet eval, hash committed, post-run verification
stable-baselines3-rl/ — RL agent on LunarLander-v2, mean episode reward claim, threshold direction >=
inspect-ai-refusal/ — Refusal-rate eval via Inspect AI, PRML pre-registration via falsify-inspect
huggingface-eval/ — lm-eval-harness integration, multi-task pre-registration

License

Documentation, patterns, examples: CC0 1.0 — public domain dedication. Mirror, fork, modify without attribution.
Any tooling: MIT.

Contributing

Pattern proposals welcome via PR. Each new pattern must:

Solve a real problem someone hit while implementing PRML
Be reproducible — name the tools and their versions
Include a "what doesn't work" section (we are not selling)
Be under 800 words

Open an issue first if you're unsure whether your pattern fits.

Authors

Cüneyt Öztürk Contact: hello@falsify.dev · falsify.dev

Status

v0.1 stable. v0.2 RFC open through 2026-05-22 — spec.falsify.dev/v0.2-rfc.
The PRML JSON Schema is in the SchemaStore catalog (merged 2026-05-11), so *.prml.yaml files autocomplete in VS Code, JetBrains, Helix, Zed, and Cursor out of the box.

Contributing

See CONTRIBUTING.md and the good first issue label for scoped work.

Cite the spec: Öztürk, C. (2026). PRML v0.1. Zenodo. https://doi.org/10.5281/zenodo.20177839

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
anti		anti
examples		examples
patterns		patterns
.gitignore		.gitignore
CITATION.cff		CITATION.cff
IDENTITY-LEVELS.md		IDENTITY-LEVELS.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PRML Cookbook

Patterns

Anti-patterns

Reference

Audit & compliance crosswalks

Examples

License

Contributing

Authors

Status

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

PRML Cookbook

Patterns

Anti-patterns

Reference

Audit & compliance crosswalks

Examples

License

Contributing

Authors

Status

Contributing

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages