retrieval-evaluation

Star

Here are 25 public repositories matching this topic...

mangopy / tool-retrieval-benchmark

Star

Official code for ACL2025 "🔍 Retrieval Models Aren’t Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models"

information-retrieval embedding-models large-language-models tool-learning retrieval-evaluation

Updated Dec 22, 2025
JavaScript

kidist-amde / amharic-ir-benchmarks

Star

Official codebase for the ACL 2025 Findings paper: Optimized Text Embedding Models and Benchmarks for Amharic Passage Retrieval.

information-retrieval bert bm25 passage-retrieval ndcg text-embedding amharic-corpus mrr roberta amharic-nlp huggingface-transformers colbert multilingual-nlp low-resource-nlp dense-retrieval amharic-language retrieval-evaluation academic-benchmark

Updated Jul 26, 2025
Jupyter Notebook

mburaksayici / smallevals

Star

smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.

qa chroma question-generation weaviate qa-generation milvus vector-database qdrant chromadb rag-evaluation tiny-llm retrieval-evaluation offline-evaluation retrieval-metrics

Updated Dec 4, 2025
Python

codychampion / arxiv-embedding-benchmark

Star

Published PyPI package for ArXiv embedding benchmarks, retrieval evaluation, and scientific RAG experiments.

nlp benchmarking retrieval embeddings arxiv model-evaluation rag pypi-package scientific-ml retrieval-evaluation

Updated May 22, 2026
Python

GoparapukethaN / rag-forge

Star

RAG retrieval benchmark runner with JSON reports, Pareto plots, and regression gates for retrieval quality changes.

python nlp benchmark information-retrieval embeddings regression-testing bm25 reranking quality-gates rag llm llmops retrieval-augmented-generation retrieval-evaluation retrieval-benchmark

Updated May 20, 2026
Python

toniIepure25 / FMRI2images

Star

image-reconstruction neuroscience pytorch medical-imaging neuroimaging nsd fmri representation-learning clip uncertainty-estimation probabilistic-modeling multimodal-learning self-supervised-learning brain-decoding contrastive-learning stable-diffusion openclip retrieval-evaluation vision-reconstruction

Updated May 29, 2026
Python

AKIVA-AI / toolkit-rag-quality

Star

Deterministic RAG evaluation toolkit -- retrieval metrics (recall, precision, MRR), corpus overlap detection, and CI regression gating without model calls.

python information-retrieval ci-cd data-quality rag retrieval-evaluation

Updated May 26, 2026
Python

BlackthornEmpire / RAGScaleGuard

Star

Open-source retrieval diagnostics toolkit for enterprise RAG pipelines

python open-source information-retrieval retrieval haystack rag vector-search llm langchain llmops llamaindex retrieval-augmented-generation rag-evaluation retrieval-evaluation enterprise-rag rag-diagnostics

Updated May 16, 2026
Python

rajantripathi / soas-rag-evaluation

Star

Bilingual RAG evaluation benchmark for culturally grounded English/Uzbek retrieval

multilingual nlp benchmark information-retrieval evaluation uzbek low-resource-languages rag retrieval-augmented-generation enterprise-ai retrieval-evaluation multilingual-ai corpus-engineering

Updated Jun 23, 2026
Python

wjohns989 / Muninn

Star

Local-first memory infrastructure for coding workflows: deterministic retrieval, explainable traces, MCP/REST/SDK interfaces, and standalone browser-first operation.

Updated May 14, 2026
Python

rajantripathi / open-course-rag-benchmark

Star

Open multilingual RAG benchmark for retrieval-grounded educational question answering

benchmark openstax rag hybrid-search education-ai retrieval-evaluation multilingual-ai

Updated Apr 23, 2026
Python

SrabanMondal / NSHG-RAG

Star

Research-grade neuro-symbolic RAG framework where retrieval is a policy, not a vector search, built for evaluation, ablation, and reliability analysis.

information-retrieval semantic-search umap bm25 hierarchical-clustering faiss rag neuro-symbolic ablation-study hybrid-retrieval knowledge-retrieval retrieval-evaluation llm-systems graph-retrieval

Updated Jan 4, 2026
Python

Kevin-Li-2025 / signal-rag

Star

Search and retrieval workbench with query planning, multi-source retrieval, citation checking, source-trust tiers, and extractive fallback.

search rag query-planning llm-agents citation-verification retrieval-evaluation

Updated May 29, 2026
Python

Arnav-Ajay / rag-systems-foundations

Star

A systems-level analysis of static RAG pipelines, isolating ingestion, retrieval, and ranking boundaries to expose structural failure modes before generation.

information-retrieval evaluation ranking chunking system-design rag ai-systems failure-modes hybrid-retrieval retrieval-augmented-generation retrieval-evaluation llm-systems

Updated Jan 24, 2026

Arnav-Ajay / rag-hybrid-retrieval

Star

A controlled experiment evaluating whether hybrid (dense + sparse) retrieval surfaces evidence that dense-only RAG systems misrank—without changing generation behavior.

bm25 rag hybrid-search dense-retrieval sparse-retrieval retrieval-evaluation

Updated Jan 23, 2026
Python

Kevin-Li-2025 / retrieval-eval

Star

RAG retrieval quality evaluation and regression testing toolkit with golden sets, recall/MRR metrics, reports, and CI-friendly outputs.

python benchmark openai regression-testing rag qdrant retrieval-evaluation

Updated May 29, 2026
Python

Kevin-Li-2025 / coreb-retrieval-sota

Star

Reproducible CoREB retrieval benchmark snapshot with CI-backed evaluation artifacts and result provenance.

benchmark reproducibility rag llm-evaluation coreb retrieval-evaluation

Updated May 29, 2026
Python

eswar06 / rag-evaluation-dashboard

Star

RAG Evaluation Playground — Visualize, compare, and evaluate retrieval performance across different chunking strategies, embeddings, and reranking approaches.

semantic embeddings semantic-search observability reranking rag vector-search ai-engineering retrieval-evaluation

Updated Jun 21, 2026
TypeScript

importrayhan / QPP_4_ASSISTANT

Star

QPP for Clarification Need Prediction in context-grounded multi-turn Conversation. Clean implementations of QPP baselines suitable for multi-turn conversational dataset with ranked documents (opt.). Designed to detect ambiguous search queries.

natural-language-processing chatbot-framework statistical-models conversational-agents query-understanding query-performance-prediction retrieval-augmented-generation retrieval-evaluation multi-turn-conversation

Updated May 21, 2026
Python

FishRaposo / rag-evaluation-lab

Star

RAG evaluation framework: hit-rate, MRR, faithfulness scoring, and async batch evaluation with golden question datasets

python evaluation rag pgvector llm-evaluation rag-evaluation llm-observability retrieval-evaluation golden-dataset

Updated Jun 16, 2026
TypeScript

Improve this page

Add a description, image, and links to the retrieval-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the retrieval-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

retrieval-evaluation

Here are 25 public repositories matching this topic...

mangopy / tool-retrieval-benchmark

kidist-amde / amharic-ir-benchmarks

mburaksayici / smallevals

codychampion / arxiv-embedding-benchmark

GoparapukethaN / rag-forge

toniIepure25 / FMRI2images

AKIVA-AI / toolkit-rag-quality

BlackthornEmpire / RAGScaleGuard

rajantripathi / soas-rag-evaluation

wjohns989 / Muninn

rajantripathi / open-course-rag-benchmark

SrabanMondal / NSHG-RAG

Kevin-Li-2025 / signal-rag

Arnav-Ajay / rag-systems-foundations

Arnav-Ajay / rag-hybrid-retrieval

Kevin-Li-2025 / retrieval-eval

Kevin-Li-2025 / coreb-retrieval-sota

eswar06 / rag-evaluation-dashboard

importrayhan / QPP_4_ASSISTANT

FishRaposo / rag-evaluation-lab

Improve this page

Add this topic to your repo