This repo now includes detector runtime code, calibration fixtures, tests, and GitHub workflows. Agents may edit source, tests, docs, evaluation metadata, and workflow files when the task requires it.
Do not edit model artifact directories (meld_model, tmr_model, raid_model)
unless the user explicitly asks for model deployment or refresh.
The repo also ships the personal_style_pl package (Polish personal writing-style
similarity). It has its own rules in personal_style_pl/AGENTS.md and its design
spec/plan in docs/superpowers/. It shares the single Python 3.12 .venv
(see scripts/setup_style_env.sh).
Keep docs short, operator-focused, and runnable. Document the CLI in a way that another agent can:
- deploy models
- run inference
- interpret outputs
- validate quickly
- run smoke checks
- use calibration files
README.md MUST include all of the points below:
- Ensemble review summary for MELD / TMR / MAGE
- Explicit local model folders (
meld_model,tmr_model,raid_model) - Hugging Face model sources for each expert and the one-command
ai-detector-deploy --alldeployment path - Input/Output review:
- AI / human decision logic
ai_probabilityandhuman_probabilitymeaning- explicit note that these are raw uncalibrated scores when no calibration is configured
- Inference command forms:
--text--text-file- stdin (pipe)
- JSON output mode
- Quick tests and one heavy smoke test
- Limitations for Polish (PL) and OOD inputs
After any doc change, run and verify:
ai-detector --helpafter package install, orpython3 run_ensemble.py --helpfrom a checkout fallbackai-detector-deploy --helpandai-detector-deploy --list-modelsafter package install, orpython3 deploy_meld.py --helpandpython3 deploy_meld.py --list-modelsfrom a checkout fallbackai-detector-heuristic --helpafter package install, orpython3 heuristic_detector.py --helpfrom a checkout fallbackai-detector-calibrate --helpafter package install, orpython3 calibrate_detector.py --helpfrom a checkout fallbackpersonal-style-pl --helpafter package install, orpython3 -m personal_style_pl.cli --helpfrom a checkout fallback- quick checks from README
- heavy smoke test from README
- output keys expected by operator scripts are still present (
experts.*,ensemble,calibration)
If anything changes in CLI behavior, update this checklist immediately.
If the output contract changes, update README.md and relevant tests in the same turn
and keep negative evaluation outcomes visible (no hiding of poor OOD/PL cases).
- Keep runtime changes covered by unit tests.
- Treat model weights as external runtime data dependencies, not source files.
Use
ai-detector-deploy --allfor the packaged Hugging Face model set unless the user explicitly asks for a custom model deployment or refresh. - Keep source-distribution fixtures in sync through
MANIFEST.in; tests and calibration examples rely ondata/evaluation/being present in the sdist. - Keep negative PL/OOD outcomes visible; do not hide false-positive cases when tuning weights, thresholds, or calibration files.
- Calibration files are operating-point calibrations unless a probability calibration method is explicitly implemented and validated.
- Prefer
ai-detector-daemonfor repeated local scoring on the same machine when avoiding per-command model load dominates runtime. - Keep runs reproducible: pin thread/environment settings in benchmark and smoke
commands (
OMP_NUM_THREADS,MKL_NUM_THREADS,taskset,--threads,--device), and record those settings in notes/artifacts. - Use JSONL for all daemon traffic (one request object per line) and avoid raw JSON dumps of sensitive text when possible.
- Always shut down the daemon at the end of usage: send
{"command":"shutdown"}, otherwise terminate the process if control is lost.