Skip to content

Releases: Harperbot/metal-guard

v1.1.0

19 May 14:33
ac42b83

Choose a tag to compare

Added

  • First-run onboarding wizard: metal-guard with no arguments scans for
    recent kernel panics, explains them in plain language, and offers to
    install the shell guard.
  • metal-guard diagnose — panic scan + explanation only.
  • metal-guard guard install|uninstall|status — reversible shell guard
    that routes interactive-shell python through mlx-safe-python.

Changed

  • metal-guard with no arguments now runs onboarding instead of the
    status snapshot. Use metal-guard status for the snapshot.
  • Panic scanner (parse_panic_reports) now also scans
    /var/db/PanicReporter and ~/Library/Logs/DiagnosticReports, and
    reads modern .ips kernel-panic reports in addition to .panic.

Removed

  • Third-party runtime dependencies. metal-guard's core is now pure
    stdlib — requests (Ollama HTTP fallback) and psutil (memory
    detection) replaced with urllib and vm_stat. pip install metal-guard pulls nothing transitive.

v1.0.0 — first stable release

19 May 02:17

Choose a tag to compare

First stable release.

v1.0.0 consolidates the complete L1–L13 defence stack and the KNOWN_PANIC_MODELS registry into a stable public API.

  • Subprocess safety wire — pre-spawn gate stack (memory load gate, crash-burst recovery lockout), in-flight child-death detection (≤200 ms via select on the process sentinel), a validated done-frame protocol, and prefill-exposure tracking with advisory worker rotation.
  • SpawnRefused spawn gate — hard-blocks runner construction for panic-tier models; override with METALGUARD_LOCAL_PANIC_MODEL_BLOCK_DISABLED=1.
  • Engine fallback chainmetal_guard.fallback_chain.call_with_fallback() walks a configured tier list (mlx / ollama), falling forward on any failure.
  • Observer modeMETALGUARD_MODE=observer.

No API removals relative to v0.24.0 — this release is the stable cut of the surface delivered across the W3–W14 work.

Full per-release history and the incident behind each layer: see CHANGELOG.md.

v0.24.0 — Drift-audit wave W14 — de-Harper cleanup sweep.

19 May 02:20

Choose a tag to compare

[0.24.0] — 2026-05-19

Drift-audit wave W14 — de-Harper cleanup sweep.

Changed

  • The G1 panic-cooldown gate in authorize_mlx_spawn now uses the
    package's own panic_gate.evaluate_panic_cooldown() instead of an
    optional external integration — G1 is now an always-on gate.
  • metal_guard_cli.py is now English-only and writes its breadcrumb
    fallback to ~/.cache/metal-guard/.

Removed

  • A stale internal PR working-draft file.

Notes

  • Whole-repo pre-publish audit: removed every remaining hardcoded
    private path, private-module reference, internal project name, and
    Traditional-Chinese comment. Package metadata still attributes the
    project to its publisher — that is intentional.

v0.23.0 — Drift-audit wave W13 — final special reconcile.

19 May 02:20

Choose a tag to compare

[0.23.0] — 2026-05-19

Drift-audit wave W13 — final special reconcile.

Added

  • MetalGuard.dynamic_headroom_gb() — a load-adaptive VRAM-headroom
    estimate (base headroom plus a per-concurrent-thread term and a
    KV-growth bonus).
  • SpawnRefused exception + the local-panic-advisory spawn gate in
    MLXSubprocessRunner.__init__. When a model's advisory tier is
    panic, construction is hard-blocked with SpawnRefused; the
    METALGUARD_LOCAL_PANIC_MODEL_BLOCK_DISABLED=1 env var downgrades
    the block to a warning. Lower advisory tiers warn only. (This
    completes the gate deferred from wave W11.) SpawnRefused is
    exported from the top-level facade.

Notes

  • Verify-only findings (no code change): the general panic_cooldown
    API is already fully present in panic_gate; the panic_cooldown
    ack-reason hardening is Harper-deployment-specific and intentionally
    not ported. system_audit GPU-family detection is equivalent on
    both sides — the public-only bench_scoped_load is reverse drift
    out of this project's scope.
  • W13 is the final reconcile wave. Phase Final (the v1.0.0 release)
    is the only remaining step.

v0.22.0 — Drift-audit wave W12 — top-level dispatch reconcile.

19 May 02:20

Choose a tag to compare

[0.22.0] — 2026-05-19

Drift-audit wave W12 — top-level dispatch reconcile.

Added

  • metal_guard/fallback_chain.py — tier-by-tier dispatch over the
    process-local backends (mlx, ollama). On any tier failure /
    lockout / daemon-unreachable the chain falls forward to the next
    tier; all tiers exhausted returns an empty result with
    all_local_failed telemetry (no raise). Includes the
    workload_chains.json loader with a sha256 tamper watchdog and the
    L1/L2 layer guard (cloud backends are refused at construction time).
  • metal_guard/inference.py — the in-process MLX dispatch path:
    call_model, bench_scoped_load, pre_inference_guard,
    safe_generate, encode_image, plus the prefill / load-gate /
    process-lock pre-flight checks.

Notes

  • Both modules are accessible as submodules
    (metal_guard.fallback_chain / metal_guard.inference); the
    top-level facade __all__ is unchanged — re-exporting the dispatch
    layer through the facade is a deferred curation decision.
  • fallback_chain's mlx-tier dispatch is an unwired stub
    (NotImplementedError), ported as-is from the source — an mlx
    tier in a chain falls through to the next tier.
  • inference.py is ported as a single module; splitting a backend
    submodule out is a future refactor, not part of this reconcile.

v0.21.0 — Drift-audit wave W11 — subprocess_runner reconcile.

19 May 02:20

Choose a tag to compare

[0.21.0] — 2026-05-19

Drift-audit wave W11 — subprocess_runner reconcile.

Added

  • subprocess_runner.py reconciled against the upstream source — the
    public runner gains the full Wave A safety wire and the prefill /
    rotation wire:
    • load_gate + recovery_lockout pre-spawn gates in
      MLXSubprocessRunner.__init__.
    • child-death detection in generate() — a select.select
      sigchld arm (≤200ms) plus the pipe-EOF arm, each emitting a
      subprocess_early_death event and triggering recovery_lockout.
    • done-frame protocol dispatch — the parent validates
      protocol_version frames via done_frame.validate_frame; the
      worker emits the canonical done frame.
    • prefill-exposure / worker-rotation wire —
      _record_prefill_and_check_rotation, the should_rotate /
      rotation_verdict poll surface, and auto-rotation in
      call_model_isolated with auto_rotation_* events.
    • CadenceGuard + subprocess_inference_guard (B1) +
      gemma-4 generation flush in the worker.
    • resource_tracker cold-restart warning, CircuitBreaker
      cooldown gate, peak-memory capture, and VLM base64-image handling.
  • Wire kill-switch environment variables for emergency rollback:
    MLX_LOAD_GATE_DISABLED, MLX_RECOVERY_LOCKOUT_GATE_DISABLED,
    MLX_CHILD_DEATH_DETECT_DISABLED, MLX_SENTINEL_DETECT_DISABLED,
    MLX_DONE_FRAME_WIRE_DISABLED, METALGUARD_PREFILL_WIRE_DISABLED,
    METALGUARD_AUTO_ROTATE_DISABLED.
  • New test module tests/test_subprocess_runner_w11.py — the first
    dedicated regression coverage for the runner.

Notes

  • No new public API symbol — the runner's public surface
    (MLXSubprocessRunner / SubprocessCrashError /
    SubprocessTimeoutError / call_model_isolated /
    shutdown_all_workers) is unchanged; the facade __all__ is unchanged.
  • Deferred to W13: the SpawnRefused exception and the
    __init__ local-panic-advisory gate depend on the
    legacy panic-model schema → panic_registry reconcile, which is
    W13 scope. They are intentionally omitted from W11.
  • The worker's inline prefill-fit guard (prefill.require_prefill_fit)
    is retained — W11 ports drift in, it does not remove an existing
    public defense layer.

v0.20.0 — Drift-audit wave W10 — cross-process lock reconcile.

19 May 02:20

Choose a tag to compare

[0.20.0] — 2026-05-18

Drift-audit wave W10 — cross-process lock reconcile.

Added

  • Observer mode for acquire_mlx_lock. When METALGUARD_MODE=observer
    (and the mlx ≥ 0.31.2 / mlx-lm ≥ 0.31.3 version gate in process_mode
    passes), a lock conflict logs a warning and returns an advisory dict
    (mode="observer", advisory=True, actual_holder=…) instead of
    raising MLXLockConflict — and does not overwrite the lock file,
    so the existing holder stays visible. Defensive mode (the default)
    is unchanged: a conflict still raises.
  • Lock-file info dict now records three diagnostic keys — host
    (socket.gethostname()), python (interpreter version), and
    platform (platform.system()) — alongside the existing
    pid/label/started_at/cmdline.
  • MLXLockConflict messages now show a human-readable elapsed time
    ("running for 5m 3s") instead of the raw ISO timestamp.
  • Granular logging in read_mlx_lock (corrupt / malformed / stale lock
    files) and release_mlx_lock (success, and the not-our-lock path),
    plus a success log in acquire_mlx_lock.

Internal

  • New private helpers in mlx_lock.py: _format_elapsed,
    _current_cmdline, _try_unlink (the last also swallows
    PermissionError, not just FileNotFoundError).
  • New test module tests/test_mlx_lock_w10.py (22 tests) — the first
    dedicated coverage for the cross-process lock, including parity
    regression guards for force override, stale reclaim, and the context
    manager.

Notes

  • No new public API symbol — acquire_mlx_lock / read_mlx_lock /
    release_mlx_lock / mlx_exclusive_lock / MLXLockConflict were
    already exported; the facade __all__ is unchanged.
  • Zombie detection and force override were already public (carved
    into mlx_lock.py at the Phase 1 package split) — W10 verified them
    in place and did not re-port them.

v0.19.0 — Drift-audit wave W9 — version advisory sync.

19 May 02:20

Choose a tag to compare

[0.19.0] — 2026-05-18

Drift-audit wave W9 — version advisory sync.

Added

  • Three version advisories synced from the upstream advisory database:
    • mlx-lm#1090 (<0.31.3, info) — pre-0.31.3 global generation
      stream; observer-mode ThreadPool parallelization races on Metal.
    • mlx-lm#1256 (>=0.31.3, high) — the #1090 thread-local stream
      fix is incomplete; generation on a non-import thread still crashes.
    • transformers#1011 (>=5.5.0.dev0,<5.6, high) — transformers
      5.5.x removed ReasoningEffort, breaking Gemma 4 loads via mlx-vlm;
      registered at the transformers package level so it fires regardless
      of the mlx-vlm version.
  • check_version_advisories() now reports 16 advisories (was 13).

Notes

  • upstream_patches needed no port — install_upstream_defensive_patches
    / _patch_mlx_lm_1128 are already public in metal_guard.version_advisories
    (regression-locked by tests/test_version_advisories_w9.py).
  • Two non-safety observability helpers from the upstream module
    (_infer_fixed_in, log_active_advisories_at_startup) remain
    un-synced — deferred as minor forward drift.

v0.18.1 — Drift-audit wave W8 — forensics layer (verify-only)

19 May 02:20

Choose a tag to compare

[0.18.1] — 2026-05-18

Drift-audit wave W8 — forensics layer (verify-only, no functional change).

Notes

  • orphan_monitor, postmortem, and postmortem_collect needed no port:
    the Phase 1 package split already carried the orphan-monitor /
    postmortem / status-snapshot layer into metal_guard.forensics in
    de-Harper-ified form, and metal-guard orphan-scan / metal-guard postmortem are the canonical CLI entry points (superseding the
    python -m postmortem_collect shim). Regression-locked by
    tests/test_w8_already_public.py.
  • No public API change; this is a patch release marking the W8
    reconciliation milestone.

v0.18.0 — Drift-audit wave W7 — OOM / prefill layer.

19 May 02:20

Choose a tag to compare

[0.18.0] — 2026-05-18

Drift-audit wave W7 — OOM / prefill layer.

Added

  • metal_guard.oom_precheck — high-level pre-call OOM gate wrapping
    the prefill-allocation math with model-dims / GPU-memory auto-fetch
    (precheck_inference, precheck_inference_or_raise, OOMVerdict).
  • metal_guard.prefill_exposure_tracker — per-worker cumulative
    prefill-tokens lifetime tracker for the delayed-trigger panic class
    (record_prefill, get_state, forget_worker, all_states,
    WorkerPrefillState).
  • metal_guard.worker_rotation — advisory prefill-budget worker
    rotation gate (should_rotate_worker, RotationVerdict,
    REASON_DISABLED / REASON_NO_STATE / REASON_UNDER_BUDGET /
    REASON_PREFILL_BUDGET_EXCEEDED).

Changed

  • Public API surface grows by 14 names; tests/test_phase1_facade.py
    surface lock updated 151 -> 165.

Notes

  • prefill_guard needed no port — its full surface already lives in
    metal_guard.prefill since the Phase 1 refactor (regression-locked by
    tests/test_w7_already_public.py).