DocuMind

An open source document intelligence API. Upload any PDF and ask questions — DocuMind finds the most relevant information, reranks it for accuracy, and returns a cited answer with source page numbers.

Real world use cases

Law firm uploads contracts → asks "what are the termination clauses?"
Student uploads textbook → asks "explain Newton's third law"
Company uploads HR policy → employees ask questions about leave rules

Architecture

PDF → pdfplumber (load + chunk) → HuggingFace embeddings → Qdrant vector store → Cohere reranking → Groq LLaMA 3.3 → answer + citations

Tech Stack

Python, FastAPI
LangChain — LLM application framework
Qdrant — vector database (local, no Docker needed)
HuggingFace — free local embeddings (all-MiniLM-L6-v2)
Cohere Rerank — improves retrieval accuracy
Groq API — free, fast LLM inference (LLaMA 3.3 70B)

Why this stack

Qdrant over ChromaDB => production-grade, better performance
Cohere reranking => separates basic RAG from accurate RAG
HuggingFace embeddings => free, no API cost, runs locally

API Endpoints

GET /health — health check
POST /upload — upload a PDF and process it
POST /query — ask a question, get answer + sources

How to Run

Clone the repo
Create virtual environment: python -m venv venv
Activate: venv\Scripts\activate
Install: pip install -r requirements.txt
Copy .env.example to .env and add your API keys
Run: uvicorn api:app
Open http://127.0.0.1:8000/docs to test

API Keys Required (all free)

GROQ_API_KEY — console.groq.com
GOOGLE_API_KEY — aistudio.google.com
COHERE_API_KEY — dashboard.cohere.com

Known Limitations

Scanned/image PDFs not supported (no OCR)
Table extraction is basic (complex tables may lose structure)
No hybrid search (vector + keyword) — planned improvement
Single user only - concurrent access requires Qdrant server mode

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
api.py		api.py
embedder.py		embedder.py
generator.py		generator.py
loader.py		loader.py
main.py		main.py
requirements.txt		requirements.txt
retriever.py		retriever.py
vectorstore.py		vectorstore.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocuMind

Real world use cases

Architecture

Tech Stack

Why this stack

API Endpoints

How to Run

API Keys Required (all free)

Known Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocuMind

Real world use cases

Architecture

Tech Stack

Why this stack

API Endpoints

How to Run

API Keys Required (all free)

Known Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages