Skip to content

kunalsuri/ai-fication-kit

ai-fication-kit — legacy → AI-native, with a human in the loop



Apache 2.0 License Node 18+ Python 3.8+ Claude Code Works with CI

Make any repository AI-native — with a human in the loop.


A Toolkit to Give AI Coding Agents a Trusted Map of Any Existing/Legacy Repo

  • Drafted by AI Agents, verified by Humans, and kept mechanically honest.

  • One command scaffolds it, and depending on the complexity of the codebase, it can be made trustworthy in 30 minutes to a few hours.

  • Two outcomes from one workflow: it makes your codebase AI-native, and it produces AI-Powered Repo Intelligence — a human-approved knowledge-base (ai/) that lets a new teammate onboard instantly.

Tip

Brand new here? Follow the one linear path in docs/GETTING-STARTED.md (zero → trusted map in five steps), and keep the Glossary open for any unfamiliar term ([inferred], Stability, slash command, …). New to AI coding agents specifically? Jump to the 2-minute primer first.

📑 Table of Contents

⚡ Quick Start

Get up and running in under five minutes.

Prerequisites: Node.js ≥ 18 or Python ≥ 3.8 — pick whichever you prefer. Both installers are feature-identical and zero-dependency (stdlib only, no packages to install).

1️⃣ Run the Scaffolder

Select one of the options below depending on your stack and preferences:

Option A: Direct via npx (No Clone Required, JS/TS Developers)

Run the installer directly using npx against the GitHub repository:

# 1 · Preview the installation (writes nothing, dry-run)
npx github:kunalsuri/ai-fication-kit shazam /path/to/your/repo --dry-run

# 2 · Run the live installation
npx github:kunalsuri/ai-fication-kit shazam /path/to/your/repo

Note

Future publishing note: We plan to publish this kit to the public npm registry. Once published, you'll be able to run npx ai-fication-kit shazam /path/to/your/repo directly.

Tip

First run is guided. The first time you shazam a repo, a read-only maturity check runs first (no LLM, no writes) and determines if this is a legacy repo (Process 1 — create everything from scratch) or a modern repo that already has CLAUDE.md/AGENTS.md (Process 2 — back up existing files, then install). Then a short wizard (4–5 questions) sets you up safely: it warns you if you're on main/master, lets you confirm the stack, and (for Process 2) explains the backup flow. Your answers are saved to ai/repo-profile.json. Pass --yes to skip.

Option B: Local Clone (Node.js or Python Developers)

Clone the repository and run the scripts locally (pure Node.js or Python stdlib):

# Clone the repository
git clone https://github.com/kunalsuri/ai-fication-kit.git
cd ai-fication-kit

# Run with Node.js
node install.mjs shazam /path/to/your/repo

# OR run with Python (pure stdlib, no external dependencies)
python install.py shazam /path/to/your/repo

Note

The name shazam is inspired by the magic word: the idea is to transform a repository into an AI-native repository with a single command. Under the hood, shazam runs check-repo-maturityorient → starts the intake wizard → stamps the intelligence layer → prints clear next steps.


2️⃣ Initialize Agent Maps

Open your target repository in Claude Code (or your agent of choice) and run:

/cold-start

This command runs for ~5 minutes as the agent scans the code and drafts the initial map.


3️⃣ Conduct Your Human Audit

Open ai/guide/MODULE_MAP.md to review the generated draft:

  1. Define each module's Stability (frozen / stable / ours / ?).
  2. Mark verified entries as [verified].
  3. Keep the docs mechanically honest — at any time, cross-check every file-path claim in the maps against the real tree (deterministic, no LLM):
node install.mjs verify /path/to/your/repo        # or: python install.py verify ...
# writes ai/analysis/audit-reports/VERIFICATION_MANIFEST.json + a readable report
# add --strict to fail (exit 1) on stale claims, e.g. in CI

As the code evolves, drift keeps the map honest the other way — it reports code the map has stopped covering (deterministic; the opt-in --git flag adds a stale check):

node install.mjs drift /path/to/your/repo         # or: python install.py drift ...
# reports unmapped directories + vanished entries; add --git for stale [verified] rows
# add --strict to fail (exit 1) on any drift, e.g. in CI

Need to adjust options? Override them: --name, --build, --test, --upstream. Changed your mind? Cleanly remove everything:

node install.mjs uninstall /path/to/your/repo

Want a read-only diagnostic first? Run the maturity check standalone:

node install.mjs check-repo-maturity /path/to/your/repo
# writes ai/analysis/audit-reports/MATURITY_REPORT.json (no LLM, no code changes)

Tip

The audit is the step that makes everything else trustworthy. See docs/AUDIT-GUIDE.md for a step-by-step walkthrough, and docs/FAQ.md for answers to common questions.


🔄 How It Works

Flowchart showing the ai-fication-kit workflow: maturity check, orient scan, install scaffolding, cold-start inference, human audit, verify checks, and add-feature development

The Maturity Gate: One Command, Two Paths

Before any files are written, shazam runs a read-only maturity check that inspects 11 aspects of your repo (version control, build system, tests, CI/CD, docs, dependencies, code structure, license, and — crucially — whether CLAUDE.md/AGENTS.md already exist). The result determines which installation process runs:

Process 1 — Legacy Process 2 — Modern
Trigger No existing CLAUDE.md / AGENTS.md, or only kit-generated ones User-authored CLAUDE.md and/or AGENTS.md detected (no kit footer)
What happens Everything created from scratch Existing files backed up with timestamp, then templates installed
Backup files None CLAUDE_bkp_20260617_221847.md, AGENTS_bkp_20260617_221847.md
/cold-start Standard — drafts ai/guide/ from code Step 0.5 first — extracts knowledge from backups, then drafts
Uninstall Standard removal Reports backup file locations (never deletes them)
Re-runs Safe (skips existing unless --force) Creates new timestamped backups (no conflicts)

You can also run the maturity check standalone — no LLM, no writes, just a diagnostic:

node install.mjs check-repo-maturity /path/to/your/repo
# Prints a scored report + writes ai/analysis/audit-reports/MATURITY_REPORT.json

Process Flow Diagram

The diagram below shows both installation paths from initial scan through to safeguarded development:

flowchart TD
    %% ─── Entry ───
    Repo[("Your Repository")] --> Maturity["0. check-repo-maturity<br/>(11 deterministic checks)"]
    Maturity --> Decision{"User-authored<br/>CLAUDE.md / AGENTS.md?"}

    %% ─── Process 1: Legacy ───
    Decision -- "No / Kit-generated<br/>→ Process 1" --> Orient1["1. orient<br/>(detect stack)"]
    Orient1 --> Profile1[["ai/repo-profile.json<br/>(process: 1)"]]
    Profile1 --> Install1["2. install<br/>(stamp templates)"]
    Install1 --> Scaffolded1[["CLAUDE.md + AGENTS.md<br/>+ ai/ knowledge layer"]]

    subgraph P1["Process 1 — Legacy Repo (from scratch)"]
        Orient1
        Profile1
        Install1
        Scaffolded1
    end

    %% ─── Process 2: Modern ───
    Decision -- "Yes, user-authored<br/>→ Process 2" --> Backup["1a. Backup existing files<br/>CLAUDE_bkp_YYYYMMDD_HHmmss.md"]
    Backup --> Orient2["1b. orient<br/>(detect stack)"]
    Orient2 --> Profile2[["ai/repo-profile.json<br/>(process: 2)"]]
    Profile2 --> Install2["2. install<br/>(overwrite after backup)"]
    Install2 --> Scaffolded2[["CLAUDE.md + AGENTS.md<br/>+ ai/ + timestamped backups"]]

    subgraph P2["Process 2 — Modern Repo (backup + upgrade)"]
        Backup
        Orient2
        Profile2
        Install2
        Scaffolded2
    end

    %% ─── Shared: Cold-Start ───
    Scaffolded1 --> ColdStart
    Scaffolded2 --> ColdStart

    subgraph AgentLoop["Agent Loop (Inferred)"]
        ColdStart["3. /cold-start<br/>(agent infers maps)"]
        Step05{"Process 2?<br/>Backup files exist?"}
        ColdStart --> Step05
        Step05 -- "Yes" --> Extract["Step 0.5: Extract knowledge<br/>from *_bkp_*.md backups"]
        Extract --> InferredMap
        Step05 -- "No" --> InferredMap[["MODULE_MAP.md<br/>[inferred]"]]
    end

    %% ─── Shared: Human Gate ───
    subgraph HumanGate["Human Gate (Trust Verification)"]
        InferredMap --> Audit["4. Human Audit<br/>(set Stability, flip tags)"]
        Audit --> VerifiedMap[["MODULE_MAP.md<br/>[verified]"]]
    end

    %% ─── Shared: Dev Loop ───
    subgraph DevLoop["Development (Safeguarded)"]
        VerifiedMap --> Verify["5. verify + drift<br/>(mechanical checks)"]
        Verify --> AddFeature["6. /add-feature<br/>(safe implementation)"]
    end

    %% ─── Styling ───
    class Repo repo;
    class Maturity,Decision gate;
    class Orient1,Install1,Backup,Orient2,Install2 setup;
    class Profile1,Scaffolded1,Profile2,Scaffolded2,InferredMap,VerifiedMap files;
    class ColdStart,Step05,Extract,Verify,AddFeature agent;
    class Audit human;

    classDef repo fill:#2d3748,stroke:#1a202c,stroke-width:2px,color:#fff;
    classDef gate fill:#d69e2e,stroke:#975a16,stroke-width:2px,color:#fff;
    classDef setup fill:#2b6cb0,stroke:#1a365d,stroke-width:2px,color:#fff;
    classDef files fill:#553c9a,stroke:#322659,stroke-width:2px,color:#fff;
    classDef agent fill:#2f855a,stroke:#1c4530,stroke-width:2px,color:#fff;
    classDef human fill:#c53030,stroke:#742a2a,stroke-width:2px,color:#fff;

    style P1 fill:none,stroke:#2b6cb0,stroke-width:1.5px,stroke-dasharray: 5 5;
    style P2 fill:none,stroke:#d69e2e,stroke-width:1.5px,stroke-dasharray: 5 5;
    style AgentLoop fill:none,stroke:#2f855a,stroke-width:1.5px,stroke-dasharray: 5 5;
    style HumanGate fill:none,stroke:#c53030,stroke-width:1.5px,stroke-dasharray: 5 5;
    style DevLoop fill:none,stroke:#00a3c4,stroke-width:1.5px,stroke-dasharray: 5 5;
Loading

Process 1 — Legacy Repo (No Existing AI Config)

This is the original flow. The kit creates everything from scratch:

  1. check-repo-maturity → detects no user-authored CLAUDE.md/AGENTS.mdProcess 1
  2. orient → reads marker files, writes ai/repo-profile.json with maturity.process: 1
  3. install → stamps all templates (CLAUDE.md, AGENTS.md, ai/ tree)
  4. /cold-start → agent drafts ai/guide/ docs from the code, all tagged [inferred]

Process 2 — Modern Repo (Existing AI Config)

For repos that already have a hand-written CLAUDE.md or AGENTS.md:

  1. check-repo-maturity → detects user-authored files (no <!-- Installed by ai-fication-kit footer) → Process 2
  2. Backup → copies CLAUDE.mdCLAUDE_bkp_20260617_221847.md (timestamped, never conflicts)
  3. orient → reads marker files, writes ai/repo-profile.json with maturity.process: 2
  4. install → overwrites the backed-up files with kit templates, stamps ai/ tree
  5. /cold-start Step 0.5 → reads *_bkp_*.md files, extracts knowledge (conventions, architecture, gotchas, module descriptions) → merges into ai/guide/ docs tagged [inferred — from prior config]
  6. /cold-start continues normally → drafts remaining docs from code

Important

Nothing is lost. Backup files are preserved through uninstall. The prior config becomes seed knowledge for the new ai/ layer — the best of both worlds.

The 7-Step Workflow

Step Owner Description
0️⃣ check-repo-maturity Script (Seconds) Read-only diagnostic. 11 deterministic checks (version control, build system, tests, CI/CD, docs, locks, code structure, license, AI config). Scores 0–100, determines Process 1 or 2. No LLM, no writes.
1️⃣ orient Script (Seconds) Deterministic observation. Reads marker files (package.json, pom.xml, pyproject.toml, *.csproj/*.sln, CMakeLists.txt, go.mod, Cargo.toml, etc.) and writes ai/repo-profile.json (languages, build/test commands, fork status, maturity data). No LLM. Nothing executed.
2️⃣ install Script (Seconds) Scaffolding. Process 2: backs up existing files first. Then stamps templates into your repository. Records every written file in an install manifest so uninstall can perform a clean removal.
3️⃣ /cold-start Agent (~5 Mins) Model inference. Process 2: Step 0.5 extracts knowledge from *_bkp_*.md backups first. Then drafts MODULE_MAP.md, diagrams, and candidate features. Every claim is tagged [inferred].
4️⃣ Your Audit You (~30 Mins) The trust verification. Review the map, set module stability (frozen / stable / ours / ?), and flip confirmed rows to [verified].
5️⃣ Verify (Optional) Script + Agent Stability checks. verify (script, no LLM) mechanically cross-checks every file-path claim in the docs against the real tree → VERIFICATION_MANIFEST.json + report. Then /post-cold-start-verification (semantic gap report), /verify-ai-readiness (maturity rating), or /perform-feature-add-simulation (simulated friction check).
6️⃣ /add-feature Agent Safeguarded development. The agent builds specs, navigates using the maps, runs tests, and updates the knowledge layer without touching frozen code.

📦 What You Get

This kit scaffolds a minimal, highly structured knowledge directory inside your target repository. Once /cold-start has populated it and a human has verified it, this ai/ directory is your AI-Powered Repo Intelligence — the knowledge-base that both agents and new teammates read to get up to speed:

your-repo/
├── CLAUDE.md                   # auto-loaded by Claude Code (thin; points everywhere else)
├── AGENTS.md                   # same rules for Cursor, Copilot, Codex, Windsurf
├── CLAUDE_bkp_*.md             # (Process 2 only) timestamped backup of prior config
├── AGENTS_bkp_*.md             # (Process 2 only) timestamped backup of prior config
├── ai/
│   ├── INDEX.md                # role → path manifest (prompts reference roles, not paths)
│   ├── repo-profile.json       # machine-readable facts from orient (deterministic)
│   ├── install-manifest.json   # what the installer wrote (for clean uninstall)
│   ├── guide/                  # navigation, loaded every session
│   │   ├── MODULE_MAP.md       # directory → responsibility → Stability  ← START HERE
│   │   ├── PROJECT_OVERVIEW.md · ARCHITECTURE.md · FEATURE_MAP.md · CONVENTIONS.md
│   ├── analysis/               # generated artifacts, loaded on demand
│   │   ├── FEATURE_CATALOG.md  # feature → files index (+ _BACKEND/_FRONTEND splits)
│   │   ├── diagrams/           # Mermaid; regenerate, don't hand-maintain
│   │   ├── audit-reports/      # verification, drift, & maturity reports
│   │   └── problems/           # dated analyses of specific issues
│   └── lab/                    # development intelligence: specs/, decisions/ (ADRs),
│                                 evaluations/, experiments/
└── .claude/                    # commands (/cold-start, /add-feature, …),
                                  subagents (repo-explorer, feature-builder, test-runner),
                                  and the add-feature skill

Directory Structure Highlights

  • Root Guides (CLAUDE.md / AGENTS.md): Thin root files that point the agent to the ai/ folder.
  • Knowledge Guide (ai/guide/): Core maps (MODULE_MAP.md is your starting point!), conventions, and architectural overviews loaded by the agent every session — and, once verified, the first thing a new team member reads to onboard.
  • Analysis Outputs (ai/analysis/): Deep analytical results generated by the agent (e.g. diagrams, feature catalogs, and problems logs).
  • Lab Space (ai/lab/): A dedicated area for specifications (RFCs), architecture decision records (ADRs), and evaluations.
  • Agent Operations (.claude/): Reusable slash commands, helper subagents (repo-explorer, feature-builder, test-runner), and custom agent skills.

🌉 The Bridge to AI-Native Onboarding

Diagram showing a bridge connecting legacy codebase complexity on the left to AI-native developer workflow on the right, with the ai/ knowledge layer as the bridge span

For engineers onboarding onto a complex codebase, the learning curve is historically steep. AI coding agents can accelerate this transition, but they get lost without a reliable map.

This kit acts as a bridge: combining a minimal knowledge store (the ai/ folder) with automated tooling to help developers and AI agents collaborate safely. It is designed to help engineers adapt and become AI-native very fast.

Running this kit delivers two outcomes at once:

1. It makes your codebase AI-native. Agents stop guessing. They read a compact, provenance-tracked map instead of re-crawling the tree every session, so they edit the right module and respect what's off-limits.

2. It produces AI-Powered Repo Intelligence. When you run /cold-start, the agent gathers everything it can learn about the repository — module responsibilities, architecture, feature touch-points, conventions, diagrams — and writes it into the ai/ folder. A human then approves it (the [inferred][verified] flip). At that point ai/ is no longer scaffolding: it is a verified knowledge-base, a single trustworthy source of truth about the repo that both humans and agents can rely on.

🚀 Instant onboarding for new team members

Once the knowledge-base is verified, the payoff isn't limited to AI agents — it's for people too.

Historically, a new engineer joining a complex or legacy codebase spends days (sometimes weeks) reverse-engineering it: which module does what, what's safe to touch, where a feature actually lives, why a decision was made. That tribal knowledge usually lives in a few people's heads.

With a verified ai/ knowledge-base in place, a new teammate can onboard almost instantly:

  • They read ai/guide/MODULE_MAP.md to see, at a glance, every module, its responsibility, and whether it's frozen / stable / ours.
  • They follow PROJECT_OVERVIEW.md, ARCHITECTURE.md, and FEATURE_MAP.md for the why and the where.
  • They (or their AI agent) can ask questions against the knowledge-base and trust the answers, because a human signed off on every [verified] claim.

The same human-verified map that keeps AI agents honest becomes the fastest onboarding doc your team has ever had — and because the verify step keeps it mechanically honest, it stays accurate as the code evolves.


🛡️ The Problem & The Solution

Side-by-side comparison: left panel shows a chaotic legacy repo with scattered files and no context, right panel shows the same repo with structured ai/ knowledge maps providing clear navigation

🛑 The Problem: The Agent Context Tax

AI coding agents (such as Claude Code, Cursor, Copilot) are highly capable, but they are context-blind on large or legacy repositories.

  • Token Burn: They re-read the directory tree every session.
  • Guesswork: They guess which files are safe to modify, burning through your context windows.
  • Dangerous Hallucinations: An agent-hallucinated map is worse than no map: the agent will confidently edit the wrong module.

✅ The Solution: A Provenance-Tracked Map

The answer isn't to rewrite your code. It's to give the agent a provenance-tracked map where every claim must be validated by you:

  • [inferred] ➔ Scaffolds and maps drafted by the AI agent or installer.
  • [verified] ➔ Human-checked and confirmed repository facts.
  • 🚫 Strict Security: AI agents are forbidden from marking their own drafts as [verified]. The flip is your signature.

🤖 New to AI Coding Agents? Start Here

Illustrated glossary of AI coding concepts: agent, context window, slash commands, provenance tags, and subagents, each with a short definition and icon

If slash commands and "context windows" are new to you, here is a quick terminology orientation:

🤖 AI Coding Agent An autonomous assistant (like Claude Code, Cursor, or Copilot) that goes beyond simple autocomplete. It can read files, execute terminal commands, and perform edits across your codebase.

💻 Claude Code Anthropic's command-line coding agent. In the Claude Code interface, commands are prefixed with a slash (like /cold-start or /add-feature).

🧠 Context Window & Tokens The active working memory of an AI agent. Because large codebases easily overwhelm this memory, this kit builds a compact ai/ directory map so the agent reads key maps instead of crawling the entire project.

🏷️ Provenance Tagging The trust boundaries of the repository:

  • [inferred]: Scaffolding and drafts generated automatically by the AI agent.
  • [verified]: Human-checked, finalized files. AI agents are structurally restricted from modifying verified code.

👥 Subagents Helper assistant processes (repo-explorer, feature-builder, test-runner) spawned by the main agent to perform specific, isolated tasks.

Using Cursor, Copilot, or Codex instead of Claude Code?

Those tools read AGENTS.md (the rules and the knowledge map), but slash commands and subagents are Claude Code-specific. With other tools, you drive the workflow by hand — e.g. paste the contents of .claude/commands/cold-start.md as a prompt to run the cold-start pass.


🔒 Security & Trust Guarantees

We designed the installer to be lightweight and safe:

  • 🪶 Zero Dependencies – Node stdlib / Python stdlib only. No external npm packages.
  • 🔒 No Network or Execution – It only copies and stamps text files. No remote API calls or arbitrary code runs.
  • 🛡️ Safe Scoping – It only writes files inside your target directory.
  • 🔍 Dry-Run Support – Run with --dry-run to see exactly what files will be created before writing anything.
  • 🧹 Clean Removal – The installer writes ai/install-manifest.json. The uninstall command reads it to remove exactly what was written, leaving no trace.

For more details, read both installers or refer to SECURITY.md.


⚖️ How This Toolkit Differs

While other tools scaffold files or evaluate repositories, this kit focuses on trust through provenance, with the human as the authority:

Design Pillar How We Implement It
Deterministic Scan vs. Model Inference A strict separation between deterministic environment checks (orient) and model generation (/cold-start).
Provenance Tracking The strict [inferred][verified] progression ensures you always know what has been human-checked.
Fork-Aware Stability Classified stability markers (frozen / stable / ours / ?) prevent the agent from touching upstream or legacy modules.
Active Verification The verify command deterministically cross-checks every file-path claim in the knowledge docs against the source tree (manifest + report, no LLM); agent workflows then cover the semantic checks a script cannot judge.
Drift Detection The drift command catches the reverse problem as code evolves — directories the map no longer covers, entries that have vanished, and (with --git) [verified] rows whose code changed — so the map ages with the repo instead of silently rotting.
Dual-Mode Installation Automatic detection of legacy vs. modern repos. Process 2 preserves prior knowledge through timestamped backups and feeds it into /cold-start as seed intelligence — no user work is lost.

🤝 Contributing

See CONTRIBUTING.md for guidelines. Issues, example repos, and template improvements are the most helpful contributions right now. The project is pre-v1.0 and maintained by a single author — feedback from running the kit on real legacy repositories is especially valuable.


📖 Citation

If you use this kit in academic or research work, please cite it:

@software{suri2026aificationkit,
  author    = {Suri, Kunal},
  title     = {ai-fication-kit: a human-auditable method and kit for making legacy repositories AI-native},
  year      = {2026},
  url       = {https://github.com/kunalsuri/ai-fication-kit},
  version   = {0.1.0},
  license   = {Apache-2.0}
}

See CITATION.cff for the machine-readable format.


📄 License

This project is licensed under the Apache License 2.0.


🙏 Acknowledgments

Developed at CEA LIST, the French Alternative Energies and Atomic Energy Commission.

Author: Kunal Suri (@kunalsuri)

About

AI-Fication Kit: AI-Powered Repo Intelligence for Legacy Code. One command scaffolds a verified mapping of features, architecture, and context — the trusted ground truth for agentic engineering, with a human in the loop.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors