Releases: Harperbot/metal-guard
Releases · Harperbot/metal-guard
v1.1.0
Added
- First-run onboarding wizard:
metal-guardwith no arguments scans for
recent kernel panics, explains them in plain language, and offers to
install the shell guard. metal-guard diagnose— panic scan + explanation only.metal-guard guard install|uninstall|status— reversible shell guard
that routes interactive-shellpythonthroughmlx-safe-python.
Changed
metal-guardwith no arguments now runs onboarding instead of the
status snapshot. Usemetal-guard statusfor the snapshot.- Panic scanner (
parse_panic_reports) now also scans
/var/db/PanicReporterand~/Library/Logs/DiagnosticReports, and
reads modern.ipskernel-panic reports in addition to.panic.
Removed
- Third-party runtime dependencies. metal-guard's core is now pure
stdlib —requests(Ollama HTTP fallback) andpsutil(memory
detection) replaced withurllibandvm_stat.pip install metal-guardpulls nothing transitive.
v1.0.0 — first stable release
First stable release.
v1.0.0 consolidates the complete L1–L13 defence stack and the KNOWN_PANIC_MODELS registry into a stable public API.
- Subprocess safety wire — pre-spawn gate stack (memory load gate, crash-burst recovery lockout), in-flight child-death detection (≤200 ms via
selecton the process sentinel), a validated done-frame protocol, and prefill-exposure tracking with advisory worker rotation. SpawnRefusedspawn gate — hard-blocks runner construction forpanic-tier models; override withMETALGUARD_LOCAL_PANIC_MODEL_BLOCK_DISABLED=1.- Engine fallback chain —
metal_guard.fallback_chain.call_with_fallback()walks a configured tier list (mlx / ollama), falling forward on any failure. - Observer mode —
METALGUARD_MODE=observer.
No API removals relative to v0.24.0 — this release is the stable cut of the surface delivered across the W3–W14 work.
Full per-release history and the incident behind each layer: see CHANGELOG.md.
v0.24.0 — Drift-audit wave W14 — de-Harper cleanup sweep.
[0.24.0] — 2026-05-19
Drift-audit wave W14 — de-Harper cleanup sweep.
Changed
- The G1 panic-cooldown gate in
authorize_mlx_spawnnow uses the
package's ownpanic_gate.evaluate_panic_cooldown()instead of an
optional external integration — G1 is now an always-on gate. metal_guard_cli.pyis now English-only and writes its breadcrumb
fallback to~/.cache/metal-guard/.
Removed
- A stale internal PR working-draft file.
Notes
- Whole-repo pre-publish audit: removed every remaining hardcoded
private path, private-module reference, internal project name, and
Traditional-Chinese comment. Package metadata still attributes the
project to its publisher — that is intentional.
v0.23.0 — Drift-audit wave W13 — final special reconcile.
[0.23.0] — 2026-05-19
Drift-audit wave W13 — final special reconcile.
Added
MetalGuard.dynamic_headroom_gb()— a load-adaptive VRAM-headroom
estimate (base headroom plus a per-concurrent-thread term and a
KV-growth bonus).SpawnRefusedexception + the local-panic-advisory spawn gate in
MLXSubprocessRunner.__init__. When a model's advisory tier is
panic, construction is hard-blocked withSpawnRefused; the
METALGUARD_LOCAL_PANIC_MODEL_BLOCK_DISABLED=1env var downgrades
the block to a warning. Lower advisory tiers warn only. (This
completes the gate deferred from wave W11.)SpawnRefusedis
exported from the top-level facade.
Notes
- Verify-only findings (no code change): the general
panic_cooldown
API is already fully present inpanic_gate; thepanic_cooldown
ack-reason hardening is Harper-deployment-specific and intentionally
not ported.system_auditGPU-family detection is equivalent on
both sides — the public-onlybench_scoped_loadis reverse drift
out of this project's scope. - W13 is the final reconcile wave. Phase Final (the v1.0.0 release)
is the only remaining step.
v0.22.0 — Drift-audit wave W12 — top-level dispatch reconcile.
[0.22.0] — 2026-05-19
Drift-audit wave W12 — top-level dispatch reconcile.
Added
metal_guard/fallback_chain.py— tier-by-tier dispatch over the
process-local backends (mlx,ollama). On any tier failure /
lockout / daemon-unreachable the chain falls forward to the next
tier; all tiers exhausted returns an empty result with
all_local_failedtelemetry (no raise). Includes the
workload_chains.jsonloader with a sha256 tamper watchdog and the
L1/L2 layer guard (cloud backends are refused at construction time).metal_guard/inference.py— the in-process MLX dispatch path:
call_model,bench_scoped_load,pre_inference_guard,
safe_generate,encode_image, plus the prefill / load-gate /
process-lock pre-flight checks.
Notes
- Both modules are accessible as submodules
(metal_guard.fallback_chain/metal_guard.inference); the
top-level facade__all__is unchanged — re-exporting the dispatch
layer through the facade is a deferred curation decision. fallback_chain'smlx-tier dispatch is an unwired stub
(NotImplementedError), ported as-is from the source — anmlx
tier in a chain falls through to the next tier.inference.pyis ported as a single module; splitting a backend
submodule out is a future refactor, not part of this reconcile.
v0.21.0 — Drift-audit wave W11 — subprocess_runner reconcile.
[0.21.0] — 2026-05-19
Drift-audit wave W11 — subprocess_runner reconcile.
Added
subprocess_runner.pyreconciled against the upstream source — the
public runner gains the full Wave A safety wire and the prefill /
rotation wire:- load_gate + recovery_lockout pre-spawn gates in
MLXSubprocessRunner.__init__. - child-death detection in
generate()— aselect.select
sigchld arm (≤200ms) plus the pipe-EOF arm, each emitting a
subprocess_early_deathevent and triggeringrecovery_lockout. - done-frame protocol dispatch — the parent validates
protocol_versionframes viadone_frame.validate_frame; the
worker emits the canonical done frame. - prefill-exposure / worker-rotation wire —
_record_prefill_and_check_rotation, theshould_rotate/
rotation_verdictpoll surface, and auto-rotation in
call_model_isolatedwithauto_rotation_*events. - CadenceGuard +
subprocess_inference_guard(B1) +
gemma-4 generation flush in the worker. - resource_tracker cold-restart warning, CircuitBreaker
cooldown gate, peak-memory capture, and VLM base64-image handling.
- load_gate + recovery_lockout pre-spawn gates in
- Wire kill-switch environment variables for emergency rollback:
MLX_LOAD_GATE_DISABLED,MLX_RECOVERY_LOCKOUT_GATE_DISABLED,
MLX_CHILD_DEATH_DETECT_DISABLED,MLX_SENTINEL_DETECT_DISABLED,
MLX_DONE_FRAME_WIRE_DISABLED,METALGUARD_PREFILL_WIRE_DISABLED,
METALGUARD_AUTO_ROTATE_DISABLED. - New test module
tests/test_subprocess_runner_w11.py— the first
dedicated regression coverage for the runner.
Notes
- No new public API symbol — the runner's public surface
(MLXSubprocessRunner/SubprocessCrashError/
SubprocessTimeoutError/call_model_isolated/
shutdown_all_workers) is unchanged; the facade__all__is unchanged. - Deferred to W13: the
SpawnRefusedexception and the
__init__local-panic-advisory gate depend on the
legacy panic-model schema →panic_registryreconcile, which is
W13 scope. They are intentionally omitted from W11. - The worker's inline prefill-fit guard (
prefill.require_prefill_fit)
is retained — W11 ports drift in, it does not remove an existing
public defense layer.
v0.20.0 — Drift-audit wave W10 — cross-process lock reconcile.
[0.20.0] — 2026-05-18
Drift-audit wave W10 — cross-process lock reconcile.
Added
- Observer mode for
acquire_mlx_lock. WhenMETALGUARD_MODE=observer
(and the mlx ≥ 0.31.2 / mlx-lm ≥ 0.31.3 version gate inprocess_mode
passes), a lock conflict logs a warning and returns an advisory dict
(mode="observer",advisory=True,actual_holder=…) instead of
raisingMLXLockConflict— and does not overwrite the lock file,
so the existing holder stays visible. Defensive mode (the default)
is unchanged: a conflict still raises. - Lock-file info dict now records three diagnostic keys —
host
(socket.gethostname()),python(interpreter version), and
platform(platform.system()) — alongside the existing
pid/label/started_at/cmdline. MLXLockConflictmessages now show a human-readable elapsed time
("running for 5m 3s") instead of the raw ISO timestamp.- Granular logging in
read_mlx_lock(corrupt / malformed / stale lock
files) andrelease_mlx_lock(success, and the not-our-lock path),
plus a success log inacquire_mlx_lock.
Internal
- New private helpers in
mlx_lock.py:_format_elapsed,
_current_cmdline,_try_unlink(the last also swallows
PermissionError, not justFileNotFoundError). - New test module
tests/test_mlx_lock_w10.py(22 tests) — the first
dedicated coverage for the cross-process lock, including parity
regression guards for force override, stale reclaim, and the context
manager.
Notes
- No new public API symbol —
acquire_mlx_lock/read_mlx_lock/
release_mlx_lock/mlx_exclusive_lock/MLXLockConflictwere
already exported; the facade__all__is unchanged. - Zombie detection and force override were already public (carved
intomlx_lock.pyat the Phase 1 package split) — W10 verified them
in place and did not re-port them.
v0.19.0 — Drift-audit wave W9 — version advisory sync.
[0.19.0] — 2026-05-18
Drift-audit wave W9 — version advisory sync.
Added
- Three version advisories synced from the upstream advisory database:
mlx-lm#1090(<0.31.3, info) — pre-0.31.3 global generation
stream; observer-mode ThreadPool parallelization races on Metal.mlx-lm#1256(>=0.31.3, high) — the #1090 thread-local stream
fix is incomplete; generation on a non-import thread still crashes.transformers#1011(>=5.5.0.dev0,<5.6, high) — transformers
5.5.x removedReasoningEffort, breaking Gemma 4 loads via mlx-vlm;
registered at thetransformerspackage level so it fires regardless
of the mlx-vlm version.
check_version_advisories()now reports 16 advisories (was 13).
Notes
upstream_patchesneeded no port —install_upstream_defensive_patches
/_patch_mlx_lm_1128are already public inmetal_guard.version_advisories
(regression-locked bytests/test_version_advisories_w9.py).- Two non-safety observability helpers from the upstream module
(_infer_fixed_in,log_active_advisories_at_startup) remain
un-synced — deferred as minor forward drift.
v0.18.1 — Drift-audit wave W8 — forensics layer (verify-only)
[0.18.1] — 2026-05-18
Drift-audit wave W8 — forensics layer (verify-only, no functional change).
Notes
orphan_monitor,postmortem, andpostmortem_collectneeded no port:
the Phase 1 package split already carried the orphan-monitor /
postmortem / status-snapshot layer intometal_guard.forensicsin
de-Harper-ified form, andmetal-guard orphan-scan/metal-guard postmortemare the canonical CLI entry points (superseding the
python -m postmortem_collectshim). Regression-locked by
tests/test_w8_already_public.py.- No public API change; this is a patch release marking the W8
reconciliation milestone.
v0.18.0 — Drift-audit wave W7 — OOM / prefill layer.
[0.18.0] — 2026-05-18
Drift-audit wave W7 — OOM / prefill layer.
Added
metal_guard.oom_precheck— high-level pre-call OOM gate wrapping
the prefill-allocation math with model-dims / GPU-memory auto-fetch
(precheck_inference,precheck_inference_or_raise,OOMVerdict).metal_guard.prefill_exposure_tracker— per-worker cumulative
prefill-tokens lifetime tracker for the delayed-trigger panic class
(record_prefill,get_state,forget_worker,all_states,
WorkerPrefillState).metal_guard.worker_rotation— advisory prefill-budget worker
rotation gate (should_rotate_worker,RotationVerdict,
REASON_DISABLED/REASON_NO_STATE/REASON_UNDER_BUDGET/
REASON_PREFILL_BUDGET_EXCEEDED).
Changed
- Public API surface grows by 14 names;
tests/test_phase1_facade.py
surface lock updated 151 -> 165.
Notes
prefill_guardneeded no port — its full surface already lives in
metal_guard.prefillsince the Phase 1 refactor (regression-locked by
tests/test_w7_already_public.py).