A lightweight retrieval-augmented documentation assistant built with FastAPI, LangChain, ChromaDB, and Ollama.
It allows developers to query their own documentation or project notes using local language models and embeddings.
DevDocs Assistant provides an API that retrieves relevant document context and generates concise answers using an Ollama model.
It is designed to run locally, fully offline once your vector database is built, and can work with any Ollama model that supports chat and embedding.
- Retrieval-Augmented Generation (RAG): Combines document retrieval with language model reasoning.
- Local Embeddings: Uses Ollama’s
nomic-embed-textmodel for vector creation. - Modular Model Selection: Defaults to TinyLlama but supports any installed Ollama model.
- FastAPI Backend: Clean and easy API for integration with frontends or other apps.
- Persistent Vector Store: Uses ChromaDB to store and retrieve embedded document chunks.
-
Clone the repository
git clone https://github.com/michaeltsige/Devdocs-Assistant.git cd Devdocs-Assistant -
Create and Activate a virtual environment
python3 -m venv .venv source .venv/bin/activate -
Install dependencies
pip install -r requirements.txt
-
Ensure Ollama is running
Install Ollama from ollama.com and pull your desired models(here tinyllama is pulled):
ollama pull tinyllama ollama pull nomic-embed-text
-
Build or update the vector store
Run the script to preprocess and embed your docs (build_index.py), before starting the API
python build_index.py
This creates the vectorstore/ directory used for retrieval.
-
Run the API server
python -m uvicorn app:app --reload --host 0.0.0.0 --port 8000
Send a POST request to /ask with your question
curl -X POST "http://localhost:8000/ask" \
-H "Content-Type: application/json" \
-d '{"question": "What is FastAPI?"}'Response:
{ "answer": " FastAPI is a high-performance web framework for building APIs with Python."
}You can verify the app is running by visiting the base URL:
GET http://localhost:8000/Returns:
{ "message": "DevDocs Assistant is running successfully!" }- Embeddings: Generated using
OllamaEmbeddings(model="nomic-embed-text"). - Vector Database: Managed by ChromaDB, persisting embeddings to
vectorstore/. - Retriever: Fetches top-k relevant document chunks (
k=5by default). - Prompt Template: Combines question and context for clarity.
- Language Model: Uses
ChatOllamawith TinyLlama (or any chosen Ollama model). - API: Built with FastAPI for simplicity and scalability.
You can modify these parameters in app.py or your build script:
model="tinyllama"→ change to any Ollama chat model (e.g.,llama3,mistral,phi3, etc.)collection_name="devdocs"→ rename for separate projectssearch_kwargs={"k": 5}→ adjust the number of retrieved documentstemperature=0.3→ tune model response consistency
- Local: Run with
uvicornas shown above. - Cloud or containerized: Mount the
vectorstore/and ensure Ollama is running. - Frontend integration: The
/askendpoint can easily power a Streamlit, React, or mobile frontend.
Contributions are welcome!
You can help by improving retrieval logic, adding caching, supporting more model configurations, or refining prompt templates.