This guide walks through every contribution scenario — from idea to published release — using the three project-local skills (/start, /release, /conventions) that automate the workflow.
Before contributing, ensure:
- Claude Code is running inside this repository (
/Users/.../ai-agent-skills) - Project-local skills are loaded — the
.claude/skills/directory is picked up automatically when Claude Code is opened in this repo - GitHub auth — either the GitHub MCP server is connected, or
ghCLI is authenticated:gh auth status # verify gh CLI gh auth login # authenticate if needed
- Python 3 — required for eval scripts:
python3 --version
Every contribution follows the same spine:
/start → cut a branch (from an issue or a description)
↓
do work
↓
run evals → trigger accuracy + functional evals
↓
/release → WRAP UP: update changelog, push, open draft PR
↓
merge on GitHub
↓
/release → CUT RELEASE: tag, push, draft GitHub release
↓
publish on GitHub
The difference between scenarios is what "do work" means and which evals matter.
Start here when: you have an idea for a skill that doesn't exist yet.
/start I need a skill that reviews pull requests and summarises the diff
/start will:
- Classify the request as a feature
- Read
.github/ISSUE_TEMPLATE/feature_request.mdand fill it out from your description - Show you the draft — edit anything, then confirm
- Create the GitHub issue with the
enhancementlabel - Pull latest
mainand cutfeature/<N>-<slug>(e.g.feature/18-pr-review-skill) - Report: "Branch feature/18-pr-review-skill created from main. You're ready to work."
Create the skill directory:
skills/<skill-name>/
├── SKILL.md ← start here
├── references/ ← add if SKILL.md exceeds ~100 lines
│ └── *.md
└── .workspace/
└── evals/
├── evals.json ← write before running evals
├── trigger-evals.json ← write before running trigger tests
└── files/ ← fixture files referenced by evals.json
Use /create-a-skill to help draft and iterate:
/create-a-skill I want to create a skill for reviewing pull requests
If you already have a draft and just want to write evals:
/create-a-skill I need to create an eval for skills/pr-review
evals.json needs at least 2 functional test cases. Each must have a prompt, expected_output, and expectations array with objectively verifiable assertions. See skills/handoff/.workspace/evals/evals.json for a worked example.
trigger-evals.json needs exactly 20 queries — 10 that should trigger the skill, 10 that should not. Tricky near-misses make the best negatives. See skills/handoff/.workspace/evals/trigger-evals.json for a worked example.
cd ~/.claude/skills/create-a-skill
python3 -m scripts.run_loop \
--eval-set /path/to/ai-agent-skills/skills/<skill-name>/.workspace/evals/trigger-evals.json \
--skill-path /path/to/ai-agent-skills/skills/<skill-name> \
--model claude-sonnet-4-6 \
--max-iterations 5 \
--verboseTarget: 95% (19/20 correct). If the script finds a better description, copy best_description from the output into the description field in SKILL.md frontmatter.
Repeat until you hit 95%.
/create-a-skill I want to run the functional evals for skills/<skill-name>
Review the graded outputs. Iterate on SKILL.md until the skill produces correct outputs consistently.
On your feature branch with all changes committed:
/release
/release (WRAP UP phase) will:
- Detect you're on a feature branch
- Detect a new skill was added
- Add a
### Addedbullet toCHANGELOG.mdunder## [Unreleased] - Add a row to the
README.mdskills table - Add an entry to
.claude-plugin/marketplace.json - Commit those admin changes
- Push the branch
- Open a draft PR with the Skill Checklist pre-filled
- Print the PR URL
Review the PR on GitHub, mark it ready for review, and merge.
After the PR is merged, switch to main:
git checkout main
git pull origin mainThen:
/release
/release (CUT RELEASE phase) will:
- Determine the next semver version from commits (
feat→ minor bump) - Show you the proposed version and ask for confirmation
- Rename
[Unreleased]inCHANGELOG.mdto[X.Y.Z] - YYYY-MM-DD - Insert a fresh
[Unreleased]section - Update the comparison links at the bottom of
CHANGELOG.md - Commit:
chore(release): cut vX.Y.Z release - Tag:
vX.Y.Z(annotated) - Push the commit and tag
- Create a draft GitHub release
- Tell you to go publish it manually
Start here when: a skill is producing wrong output, following bad steps, or behaving unexpectedly.
If a GitHub issue already exists:
/start #23
If you're filing the bug yourself:
/start The vault-scribe skill generates invalid YAML frontmatter when the title contains a colon
/start (NEW ISSUE mode) will classify this as a bug, fill bug_report.md, and cut a fix/<N>-<slug> branch.
Before editing, decide which layer is broken:
| Problem type | Where to fix |
|---|---|
| Wrong steps or missing steps | SKILL.md body |
| Wrong output format | SKILL.md body or reference file |
| Outdated or incorrect reference material | references/*.md |
| Skill not triggering reliably | description field in frontmatter |
Make your fix, then run the functional evals to confirm the bug is resolved:
/create-a-skill I want to run the functional evals for skills/<skill-name>
If the bug was an activation problem, run trigger accuracy after updating the description:
cd ~/.claude/skills/create-a-skill
python3 -m scripts.run_loop \
--eval-set /path/to/ai-agent-skills/skills/<skill-name>/.workspace/evals/trigger-evals.json \
--skill-path /path/to/ai-agent-skills/skills/<skill-name> \
--model claude-sonnet-4-6 \
--max-iterations 5 \
--verbose/release
/release will detect changes inside an existing skill directory and add a ### Fixed bullet to CHANGELOG.md. It will not touch README.md or marketplace.json (only new skills trigger those updates).
After merging:
git checkout main && git pull origin main
/releaseA bug fix commit (fix type) → patch semver bump.
Start here when: you want to add a capability, improve output quality, or restructure a skill — but nothing is technically broken.
/start #17
or
/start The design-critique skill should support async API design in addition to sync
Edit SKILL.md (and reference files if needed). Re-run functional evals after your changes:
/create-a-skill I want to run the functional evals for skills/<skill-name>
If you change the description, also run trigger accuracy and confirm you haven't regressed below 95%.
/release
/release detects a modified skill (not a new one) and adds a ### Changed bullet to CHANGELOG.md. An enhancement commit (feat type) → minor bump at release time.
Start here when: a skill exists and works correctly but is not activating reliably — either it fires on the wrong prompts, or it fails to fire on the right ones.
Run a baseline against all 20 eval queries:
cd ~/.claude/skills/create-a-skill
python3 -m scripts.run_loop \
--eval-set /path/to/ai-agent-skills/skills/<skill-name>/.workspace/evals/trigger-evals.json \
--skill-path /path/to/ai-agent-skills/skills/<skill-name> \
--model claude-sonnet-4-6 \
--max-iterations 1 \
--holdout 0 \
--report none \
--verboseExamine which queries failed:
- False negatives (should trigger, didn't): description is too narrow or missing key phrases
- False positives (shouldn't trigger, did): description is too broad or overlaps with another skill
python3 -m scripts.run_loop \
--eval-set /path/to/ai-agent-skills/skills/<skill-name>/.workspace/evals/trigger-evals.json \
--skill-path /path/to/ai-agent-skills/skills/<skill-name> \
--model claude-sonnet-4-6 \
--max-iterations 5 \
--verboseThe script will iterate on the description, improve it, and report the best score. If best_score reaches 95%, copy best_description from the JSON output into the description field in SKILL.md frontmatter.
Once at 95%:
/release
An activation-only fix uses fix type → patch bump.
Start here when: you've found a problem but aren't going to fix it yourself.
Open an issue directly on GitHub, or:
/start The handoff skill's RESUME mode ignores the constraints field entirely
/start will draft and create the bug issue. Stop there — no branch needed if you're not fixing it.
skills/<skill-name>/
├── SKILL.md Required
│ ├── --- (YAML frontmatter)
│ │ ├── name: Kebab-case, max 64 chars
│ │ ├── description: Max 1024 chars — what it does + when to trigger
│ │ └── allowed-tools: Tools this skill may call
│ └── (Markdown instructions)
├── references/ Optional — one level deep only
│ └── *.md
└── .workspace/
└── evals/
├── evals.json Required — functional test cases
├── trigger-evals.json Required — 20 trigger accuracy queries
└── files/ Optional — input fixtures
- Directory name: lowercase-hyphenated, max 64 characters
-
SKILL.mdfrontmatter hasname,description,allowed-tools -
namematches the directory name exactly -
descriptionwritten in third person with trigger words and "Make sure to use this skill whenever..." -
SKILL.mdbody under 500 lines (100 lines if simple — move detail toreferences/) - Reference files one level deep — no chains
- No secrets, credentials, or PII anywhere
-
evals.jsonhas at least 2 test cases withexpectationsarrays -
trigger-evals.jsonhas exactly 20 queries (10 should-trigger, 10 should-not-trigger) - Trigger accuracy ≥ 95% (19/20 correct) before opening a PR
-
marketplace.jsonupdated (new skill only) -
README.mdskills table updated (new skill only)
The description field is the only thing Claude reads when deciding whether to load a skill. It must be self-contained.
Do:
- Start with what the skill produces or does: "Generates...", "Reviews...", "Saves..."
- Include specific trigger phrases the user would actually say
- Use "Make sure to use this skill whenever..." to prevent under-triggering
- Name specific file types, keywords, or contexts that distinguish this skill
Don't:
- Start with "Use this to..." (weak framing)
- Be so broad that it conflicts with other skills
- Bury the trigger conditions at the end
Tests whether the skill description activates on the right prompts. Uses run_loop.py from the create-a-skill skill.
cd ~/.claude/skills/create-a-skill
python3 -m scripts.run_loop \
--eval-set skills/<skill-name>/.workspace/evals/trigger-evals.json \
--skill-path skills/<skill-name> \
--model claude-sonnet-4-6 \
--max-iterations 5 \
--verboseThe script runs up to 5 iterations, improving the description each time the score is below the threshold. When it exits, check best_description in the JSON output — if it's better than the current description, apply it to SKILL.md.
Target: 95% (19/20 correct).
Note on project-local workflow skills: The three skills in
.claude/skills/(start,release,conventions) are designed for explicit slash-command invocation and score ~50% on trigger accuracy. This is expected and acceptable — see each skill'sREADME.mdfor the explanation. The 95% target applies to skills inskills/only.
Tests whether the skill produces the right output. Uses /create-a-skill to spawn parallel subagents.
/create-a-skill I want to run the functional evals for skills/<skill-name>
Results land in .workspace/iteration-N/. Review the graded outputs — each expectations entry should show passed: true. Iterate on SKILL.md until outputs are consistently correct.
Open a GitHub issue and include:
- The skill name and version or commit SHA
- The exact prompt you used
- What you expected to happen
- What actually happened
- The Claude model and tool (e.g. Claude Code CLI, claude.ai)
Or use /start to file it directly from Claude Code.
By contributing, you agree that your contributions will be licensed under the project's MIT License.