Skip to content

studio-11-co/mlflow-falsify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mlflow-falsify — automatic PRML manifest hash tagging for MLflow runs

PyPI version Python versions License: MIT DOI Spec: PRML v0.1

Drop a PRML manifest in your repo. Every MLflow run gets cryptographically bound to it. No workflow changes.

Install

pip install mlflow-falsify

The plugin is discovered automatically through MLflow's mlflow.run_context_provider entry point.

Usage

import mlflow

# .prml.yaml exists in CWD or any parent directory — that's all you need.
with mlflow.start_run():
    mlflow.log_metric("accuracy", 0.873)
    # The run now carries prml.manifest_hash and friends as tags.

What gets tagged

When a .prml.yaml or prml.yaml is found in the current directory or any ancestor, every run is tagged with:

  • prml.manifest_hash — SHA-256 of the canonical manifest bytes (PRML v0.1 §3)
  • prml.manifest_path — relative path to the discovered manifest
  • prml.version — manifest schema version (e.g. prml/0.1)
  • prml.metric — the pre-registered metric (e.g. accuracy)
  • prml.comparator — one of >=, >, ==, <=, <
  • prml.threshold — the numeric threshold, as a string
  • prml.dataset_id — the pre-registered dataset identifier

Missing or malformed fields are silently skipped. The provider never raises into your run.

HPO sweeps and tag scope

In an HPO sweep the same PRML claim repeats across thousands of runs, so emitting the 5 descriptive tags per-run becomes pure tag noise. As of v0.2.0 the plugin supports lifting them to experiment level:

export MLFLOW_FALSIFY_TAG_SCOPE=experiment
import mlflow_falsify

mlflow.set_experiment("credit-scorer-hpo")
mlflow_falsify.tag_experiment()  # idempotent; sets metric/comparator/threshold/dataset_id/version once

for params in hpo_grid:
    with mlflow.start_run():
        ...  # only prml.manifest_hash and prml.manifest_path attach per-run

Default behaviour (MLFLOW_FALSIFY_TAG_SCOPE=run or unset) is unchanged: all 7 tags attach per-run, backward-compatible with v0.1.x.

Why this matters

  • EU AI Act Article 12 evidence layer. Every logged run carries a tamper-evident pointer to the claim it was meant to test.
  • Eval reproducibility by default. The hash freezes metric, threshold, dataset, and seed before the experiment runs.
  • Audit trails for free. Reviewers can recompute the manifest hash from the YAML and compare it against your tracked runs.
  • No workflow change. Existing MLflow code is untouched — the plugin attaches via entry points.

Links

Audit & compliance crosswalks

Where the manifest hash this plugin attaches fits in major AI governance frameworks (FULL / PARTIAL / NONE tagged):

License

MIT. Copyright 2026 Cüneyt Öztürk.