Provider-agnostic enterprise RAG and agent evaluation harness for Azure Foundry, vLLM, Ollama, and local demos.
-
Updated
Jun 9, 2026 - Python
Provider-agnostic enterprise RAG and agent evaluation harness for Azure Foundry, vLLM, Ollama, and local demos.
Evaluation patterns, release gates, and anti-hallucination techniques for developer-focused AI workflows.
Single-agent, evidence-grounded claim verification to catch LLM hallucinations — a pluggable fact-gate for agent-arena and any multi-agent system (CrewAI, AutoGen, LangGraph).
TypeScript eval harness for measuring whether Grok answers stay grounded in source evidence
Deterministic citation and claim-support checks for RAG evaluation datasets.
Detect & score LLM hallucinations by groundedness — labeled data, precision/recall/F1, runs offline with no API key. Pluggable LLM-judge backends.
RAG evaluation workbench for retrieval recall, citation coverage, groundedness checks, and failure analysis
Retrieval-Augmented Generation (RAG) research application with groundedness evaluation to increase reliability and confidence in LLM generated output (post-hoc hallucination detection).
Add a description, image, and links to the groundedness topic page so that developers can more easily learn about it.
To associate your repository with the groundedness topic, visit your repo's landing page and select "manage topics."