Ratchet Review is an open-source Claude Code skill that runs an independent AI agent to review your AI-generated output before it ships. It catches factual errors, score inflation, and buried conclusions using a 5-layer adversarial quality gate with calibrated scoring and a ratchet mechanism that ensures each revision improves on the last.
Before: "This report looks great." After: "This report is great AND the client name is actually spelled right."
AI writes fluently. It doesn't write carefully:
- A consulting firm's AI-generated government report cited fabricated sources -- the contract was partially refunded (Deloitte Australia)
- Multiple lawyers sanctioned after submitting AI-hallucinated legal citations to court -- fines exceeding $10K, public apologies (Originality.AI)
- An AI travel guide listed a food bank as a "must-visit dining experience" (Microsoft Copilot)
- 77% of daily AI users now review AI output as carefully as human colleagues' work (Workday/Hanover Research, n=3,200)
The root cause: asking the same AI to check its own work is structurally blind. The model that made the mistake will defend it. (Nature, 2025)
Ratchet Review runs a separate Critic agent that never sees the writer's reasoning -- only the deliverable and the reader's profile.
Five layers, each independently verifiable:
Layer 0 Ground Truth Fact-check names, numbers, dates against
your source data. Any error = blocked.
Layer 1 Subagent Critic Independent agent. Only sees the deliverable
+ reader persona. No shared context with writer.
Layer 2 Judge Score across 5 dimensions with calibrated
anchor cases (4.5 / 7.0 / 8.5). No inflation.
Layer 3 Ratchet Each revision must score higher than the last.
Regression = automatic rollback.
Layer 4 Learning Loop Track what improved, feed insights upstream.
bash install.sh
# or manually:
cp -r skill/ ~/.claude/skills/ratchet-review/After installing, open Claude Code and try:
Write me a short email to the Grand Hyatt about our event.
Then say: "review my output"
Watch Layer 0 fact-check the hotel name, Layer 1 critique the structure, and Layer 2 score it -- all before you hit send.
Say any of these to trigger a review:
- "review my output"
- "quality check"
- "polish this"
- "check this"
- "proofread"
- "fact check"
The skill auto-triggers based on keywords. No setup needed beyond install.
| What | How | Where |
|---|---|---|
| Reader Personas | Define who reads your output -- reading time, priorities, pet peeves | examples/personas/ |
| Scoring Anchors | Calibrate what fail / borderline / pass looks like for YOUR context | examples/anchors/ |
| Format Rules | Add dimension rules for specific output formats | skill/ directory |
See Customization Guide for details.
| Format | Review Dimensions |
|---|---|
| Email / DOCX / PDF | Hook, Structure, Evidence, Actionability, Conciseness |
| PPTX | Above + Image Quality, Anti-AI Layout |
| Design / Mockup | Physics, Brand Compliance, IP Accuracy, Composition, Purpose Fit |
Each format loads its own rule file automatically. PPTX and Design rules are included out of the box.
Most AI quality tools target developers and CI pipelines (DeepEval, Guardrails AI, NeMo Guardrails). None of them review an executive email, a board deck, or a marketing report.
This skill was built for that gap -- 3 months, 5+ retail brands, dozens of executive deliverables. It caught a misspelled venue name 2 hours before a CEO presentation. Layer 0 flagged it in seconds by cross-referencing the knowledge base.
In the 4 major tools we benchmarked, none combine independent agent review
- reader personas + calibrated scoring anchors + a ratchet mechanism. This is a verifiable positioning gap, not a category claim.
Methodology grounded in:
- Adversarial content review (Fu Sheng) -- opposition improves quality
- AutoResearch Ratchet Loop (Karpathy, 2026) -- each iteration must measurably improve, never regress
+--------------------------------------------------+
| Any Claude Code skill that produces output |
| (email writer, PPTX creator, report builder) |
+------------------------+-------------------------+
v
+--------------------------------------------------+
| Ratchet Review (this skill) |
| |
| Ground Truth -> Subagent Critic -> Judge |
| -> Ratchet (max 3 rounds) -> Learning Loop |
+------------------------+-------------------------+
v
Reviewed output
(score >= 8.0 or flagged for human review)
See Architecture Deep Dive for the full specification.
Q: Does this work with other AI tools (ChatGPT, Copilot, Cursor)? A: No. This is a Claude Code skill that uses Claude's Agent tool for the independent Critic. It's designed for the Claude Code ecosystem.
Q: Will this slow down my workflow? A: A full review cycle takes 30-60 seconds. Skip it for drafts -- the skill has built-in exclusion rules (< 50 words, raw data, code, internal notes).
Q: Can I use this for code review? A: Not designed for it. This targets natural-language deliverables (reports, emails, presentations, designs). Code has better dedicated tools.
Q: What if it keeps failing after 3 rounds? A: The skill stops automatically, outputs the best version so far, and gives you an unresolved issues list. You decide what to ship.
Q: Can I add my own review dimensions?
A: Yes. Add format-specific rule files in skill/ following the pattern
of PPTX-RULES.md or DESIGN-RULES.md.
PRs welcome. Especially:
- New format rule modules (e.g., newsletter, social media post, video script)
- Reader persona templates for different industries
- Translations of trigger keywords
- Scoring anchor examples from your domain
- Bug reports and improvement suggestions