ai-quality

Star

Here are 63 public repositories matching this topic...

Giskard-AI / awesome-ai-safety

Sponsor

Star

📚 A curated list of papers & technical articles on AI Quality & Safety

Updated Apr 14, 2025

subodhkc / llmverify-npm

Star

AI model health monitor for LLM apps – runtime checks for drift, hallucination risk, latency, and JSON/format quality on any OpenAI, Anthropic, or local client.

ai runtime-metrics ai-safety drift-detection llm ai-quality ai-code-review hallucination-detection llm-monitoring ai-observalibility

Updated Mar 30, 2026
TypeScript

greynewell / evaldriven.org

Sponsor

Star

Ship evals before you ship features.

Updated Jun 22, 2026
Nunjucks

greynewell / matchspec

Sponsor

Star

Eval framework. Define correct, test against it, get results.

Updated Feb 17, 2026
Go

vishwanathakuthota / openvals

Star

Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

machine-learning gemini openai ai-safety ai-agents ai-evaluation ai-testing ai-quality llm-tools ollama llm-benchmarking ai-evaluation-framework calude ai-reliability vishwanath-akuthota

Updated Jun 24, 2026
Python

aws-samples / sample-GEDD

Star

Find what your AI agent gets wrong — before you have a rubric. Qualitative eval for PMs.

python product-management ai-agents grounded-theory prompt-engineering ai-testing ai-quality amazon-bedrock llm-evaluation eval-framework

Updated Jun 19, 2026
Python

converra / agent-triage

Star

Diagnose your AI agents in production. Extract policies from prompts, evaluate traces, generate diagnostic reports.

Updated Mar 10, 2026
TypeScript

DUBSOpenHub / shadow-score-spec

Star

A framework-agnostic metric for measuring AI code generation quality. Sealed-envelope testing protocol + reference validators.

metrics specification code-quality ai-agents ai-quality testing-methodology gap-score sealed-envelope

Updated May 27, 2026
Python

Ryo-Hunter / suzaku

Star

朱雀 Suzaku — AI 生成品質模組。諂媚抑制、建設性挑戰、輸出適配、上下文錨定、一致性守護。基於 LDRIT 設計。

ai-safety claude ai-assistant ai-tools ai-agent llm prompt-engineering ai-quality claude-code anti-sycophancy ldrit

Updated Apr 11, 2026

jkorzeniowski / safeagentguard

Star

Open-source AI agent security testing framework. Test for prompt injection, data leakage, and privilege escalation before production.

python ai-safety ai-agents security-testing red-teaming ai-security ai-quality prompt-injection llm-testing

Updated Feb 23, 2026
Python

yousufwaqar / llm-eval-harness

Star

Provider-agnostic LLM evaluation harness: golden dataset, deterministic + LLM-as-judge scoring, RAG failure attribution, red-team suite, severity-weighted CI gate.

typescript evaluation test-automation sdet rag llm ai-quality llm-evaluation llm-as-judge

Updated Jun 5, 2026
TypeScript

ivycheck / ivycheck-python-sdk

Star

Python SDK for IvyCheck

ai gpt ai-security generative-ai ai-quality generative-ai-security-assurance

Updated Apr 17, 2024
Jupyter Notebook

HubWizard / second-pass

Star

Universal skill enhancement layer for Claude Code. Sees what your skill was trying to do, grades the gap, drives the rewrite.

ai-quality anthropic skill-enhancement claude-code claude-skills meta-skill

Updated Apr 29, 2026

syncreus / syncreus-eval

Star

Evaluate your LLM apps with one function call. Hallucination detection, RAG scoring, and agent evals for OpenAI, Anthropic, and more. 14 evaluators, pytest plugin, composite trust scores.

Updated Apr 3, 2026
Python

TeamSPWK / nova

Star

AI Agent Ops framework for Claude Code — independent evaluator, adversarial review, and pre-commit quality gate for AI-generated code.

developer-tools code-review ai-agents llm prompt-engineering ai-quality anthropic mcp-server agent-ops claude-code claude-code-plugin harness-engineering

Updated Jun 8, 2026
Shell

Rofi7777 / ratchet-review

Star

A 5-layer adversarial quality gate for Claude Code. Catches factual errors, score inflation, and buried conclusions before your AI output ships.

quality-assurance ratchet ai-quality llm-as-judge claude-code claude-code-skill adversarial-review ai-hallucination llm-quality-gate ai-output-review

Updated Apr 9, 2026
Shell

nshkrdotcom / Assessor

Sponsor

Star

The definitive CI/CD platform for AI Quality.

testing elixir otp ai functional-programming continuous-integration beam erlang-vm quality-assurance cicd ai-testing ai-quality ml-quality quality-platform nshkr-archive

Updated Apr 9, 2026
Elixir

josephsenior / agent-evaluation-platform

Star

🚀 Professional-grade AI Agent Evaluation Platform. Multi-provider LLM-as-a-Judge (OpenAI, Anthropic, Gemini), automated testing, A/B benchmarking, and safety auditing.

Updated Dec 26, 2025
Python

duomimimi / QualityForge

Star

Self-improving AI quality system - auto-score, auto-regenerate, auto-learn.

testing automation self-improvement ai-quality

Updated Jun 1, 2026
Python

Qalipso / ai-evaluation-tool

Star

Evidence-backed quality control for LLM outputs: rubrics, claim grounding, safety gates, human review, and reports.

typescript nextjs openai rubrics supabase ai-quality llm-evaluation hallucination-detection llm-as-judge safety-gates

Updated Jun 7, 2026
TypeScript

Improve this page

Add a description, image, and links to the ai-quality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-quality topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-quality

Here are 63 public repositories matching this topic...

Giskard-AI / awesome-ai-safety

subodhkc / llmverify-npm

greynewell / evaldriven.org

greynewell / matchspec

vishwanathakuthota / openvals

aws-samples / sample-GEDD

converra / agent-triage

DUBSOpenHub / shadow-score-spec

Ryo-Hunter / suzaku

jkorzeniowski / safeagentguard

yousufwaqar / llm-eval-harness

ivycheck / ivycheck-python-sdk

HubWizard / second-pass

syncreus / syncreus-eval

TeamSPWK / nova

Rofi7777 / ratchet-review

nshkrdotcom / Assessor

josephsenior / agent-evaluation-platform

duomimimi / QualityForge

Qalipso / ai-evaluation-tool

Improve this page

Add this topic to your repo