Build software better, together

isaacus-dev / semchunk

Star

A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

python nlp text splitting chunking text-chunking text-splitting semantic-chunking isaacus

Updated Jun 13, 2026
Python

mirth / chonky

Star

Fully neural approach for text chunking

ai ml chunking rag text-splitter llms semantic-chunking

Updated Oct 23, 2025
Python

jparkerweb / semantic-chunking

Star

🍱 Semantically create chunks from large document for passing to LLM workflows

vector embeddings chunking text-splitter llm text-chunking text-splitting semantic-chunking equill-library

Updated May 29, 2026
JavaScript

mburaksayici / RAG-Boilerplate

Star

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

ai-agents reranking rag vector-database hybrid-search qdrant llm retrieval-augmented-generation rag-evaluation semantic-chunking crewai rag-pipeline propositional-models query-enhancement

Updated Nov 18, 2025
Python

poloniki / quint

Star

Turn speech into semantic paragraphs in real time, on a CPU (no GPU) — single-pass streaming chunker + Moonshine STT + a live mic demo. Plus a transcribe/chunk/summarize API.

python nlp streaming real-time podcast embeddings openai summarization speech-to-text transcription gradio whisper moonshine fastapi semantic-chunking model2vec

Updated Jun 17, 2026
Jupyter Notebook

zircote / rlm-rs

Star

Rust CLI implementing the Recursive Language Model (RLM) pattern for Claude Code. Process documents 100x larger than context windows through intelligent chunking, SQLite persistence, and recursive sub-LLM orchestration.

Updated Jun 15, 2026
Rust

prajwal10001 / semantic-chunker-langchain

Star

Token-aware, LangChain-compatible semantic chunker with PDF, markdown, and layout support

python nlp markdown pdf ai rag langchain semantic-chunking

Updated Jun 28, 2025
Python

jparkerweb / llm-distillery

Star

🍶 llm-distillery ⇢ use LLMs to run map-reduce summarization tasks on large documents until a target token size is met.

text-summarization text-processing tokenization text-compression token-management openai-api llm large-language-model semantic-chunking text-distillation ai-text-reduction equill-library

Updated May 12, 2026
JavaScript

ThanhHung2112 / Semantic_chunking

Star

Semantic Chunking is a Python library for segmenting text into meaningful chunks using embeddings from Sentence Transformers.

nlp text vector chunking rag text-split vector-database semantic-chunking

Updated Dec 15, 2024
Python

bazilicum / axonode-chunker

Sponsor

Star

Advanced semantic text chunking with custom structural markers, whole-text coherence preservation, and flexible token management. Features async processing, LangChain integration, and dynamic drift detection. Ideal for RAG systems, augmented text processing, and domain-specific document analysis.

lang rag test-split langchain semantic-chunking text-spl

Updated Aug 10, 2025
Python

wigtn / wigtnOCR-v1

Star

A research framework tA research framework to evaluate how document parsing quality determines downstream RAG performance.o evaluate how document parsing quality de

benchmark ocr evaluation rag document-parsing semantic-chunking

Updated Apr 3, 2026

darkzard05 / rag-system-ollama

Star

Advanced local-first RAG system powered by Ollama and LangGraph. Optimized for high-performance sLLM orchestration featuring adaptive intent routing, semantic chunking, intelligent hybrid search (FAISS + BM25), and real-time thought streaming. Includes integrated PDF analysis and secure vector caching.

python nlp semantic-search reranking faiss rag fastapi streamlit vector-database hybrid-search langchain pdf-chat local-ai ollama semantic-chunking langgraph ai-orchestration sllm thought-streaming

Updated Mar 21, 2026
Python

njyeung / go-semantic-chunking

Star

Sementic chunking algorithm in (mostly) Go

vector embeddings chunking semantic-segmentation text-splitter text-chunking semantic-chunking retreival-augmented-generation

Updated Feb 6, 2026
Go

edycutjong / ContextWeaver

Star

AI-native data annotation pipeline using Dynamic In-Context Learning (ICL) routing. Leverages RAG to retrieve semantically relevant few-shot examples for long-context document chunking and targeted LLM prompting.

react python nlp ai nextjs data-annotation icl tailwind-css rag fastapi vector-database in-context-learning llm long-context chromadb qwen semantic-chunking

Updated Apr 26, 2026
TypeScript

Jayandhan03 / HR-Asst-rag

Star

HR Policy Assistant (RAG-based Chatbot) A conversational AI assistant for employees to query company HR policies. Built with LangChain and Qdrant, it semantically ingests HR documents, retrieves relevant policy information, reranks results with BM25/MMR, and delivers precise LLM-generated responses.Cloud-based vector storage ensure quick responses.

streamlit-webapp dense-retrieval huggingface-spaces langchain hybrid-retrieval qdrant-vector-database semantic-chunking rag-chatbot

Updated Oct 15, 2025
Python

anujmumbaikar / Optimizing-RAG-Pipeline-Hybrid-Search-RRF-Fusion-Re-Ranking

Star

A high-performance Retrieval-Augmented Generation pipeline for technical Q&A workloads. Combines hybrid retrieval (dense + BM25), query expansion, Reciprocal Rank Fusion (RRF), and cross-encoder re-ranking to improve retrieval precision and answer grounding. Evaluated with Ragas, showing measurable gains in context recall and faithfulness.

hybrid-search cross-encoder semantic-chunking reciprocal-rank-fusion ragas-evaluation

Updated Apr 30, 2026
Jupyter Notebook

v1jaysundaram / rag-with-langgraph

Star

A hands-on guide to RAG techniques using LangGraph.

mmr hyde hype reranking rag semantic-chunking contextual-compression context-window-enhancement sub-query-decomposition contextual-chunk-headers

Updated May 7, 2026
Jupyter Notebook

ian-cowley / Glacier.DocTree

Star

High-performance, zero-dependency Markdown parser and semantic chunking tree for RAG and LLM agent contexts.

csharp dotnet high-performance markdown-parser rag semantic-chunking llm-context

Updated Jun 9, 2026
C#

smart-models / Progressive-Summarizer-RAPTOR

Star

Cutting-edge semantic text processing system that uses hierarchical clustering and advanced language models to automatically organize and summarize large volumes of text.

docker rest-api gpu-acceleration raptor hierarchical-clustering rag llm semantic-chunking ollama-integration progressive-summarization

Updated Mar 15, 2026
Python

hoangtung386 / semantic-qdrant-pipeline

Star

A modular RAG pipeline for automated document processing using Semantic Chunking and Qdrant Vector Database.

python nlp rag vector-database qdrant semantic-chunking uv-package

Updated Apr 18, 2026
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-chunking

Here are 42 public repositories matching this topic...

isaacus-dev / semchunk

mirth / chonky

jparkerweb / semantic-chunking

mburaksayici / RAG-Boilerplate

poloniki / quint

zircote / rlm-rs

prajwal10001 / semantic-chunker-langchain

jparkerweb / llm-distillery

ThanhHung2112 / Semantic_chunking

bazilicum / axonode-chunker

wigtn / wigtnOCR-v1

darkzard05 / rag-system-ollama

njyeung / go-semantic-chunking

edycutjong / ContextWeaver

Jayandhan03 / HR-Asst-rag

anujmumbaikar / Optimizing-RAG-Pipeline-Hybrid-Search-RRF-Fusion-Re-Ranking

v1jaysundaram / rag-with-langgraph

ian-cowley / Glacier.DocTree

smart-models / Progressive-Summarizer-RAPTOR

hoangtung386 / semantic-qdrant-pipeline

Improve this page

Add this topic to your repo