-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the provenance-energy-rag-chatbot wiki!
Welcome to the project wiki for the Trustworthy Domain-Specific RAG Chatbot With Provenance.
This project is a document-grounded Retrieval-Augmented Generation (RAG) chatbot designed for technical support in the solar and energy equipment domain. It helps users upload technical manuals, ask questions, retrieve relevant evidence, generate grounded answers, and verify every response through visible citations and source cards.
The goal is to build a reliable AI assistant that helps engineers and technical users quickly find information from uploaded documents such as:
- inverter manuals
- solar/PV module manuals
- battery manuals
- charge controller manuals
- fan manuals
- troubleshooting guides
- maintenance documents
- fault-code tables
The assistant should not guess when the answer is not available in the uploaded documents. Instead, it should clearly say that there is not enough information to answer reliably.
- Upload PDF, DOCX, TXT, and Markdown documents
- Extract and clean document text
- Split documents into searchable chunks
- Store document chunks in a persistent vector database
- Retrieve relevant passages for a user question
- Generate answers grounded in retrieved evidence
- Show citations and source cards
- Display filename, page number, section, chunk ID, and relevance score
- Refuse unsupported answers in document mode
- Search exact fault codes without calling the LLM
- Cache repeated responses to reduce unnecessary API calls
- Python
- FastAPI
- uv
- Pydantic
- ChromaDB
- SentenceTransformers
- OpenAI-compatible LLM provider support
- pytest
- ruff
- Streamlit
- httpx
- uv
Upload manual
↓
Extract text
↓
Clean and chunk text
↓
Create embeddings
↓
Store chunks in ChromaDB
↓
Ask technical question
↓
Retrieve relevant chunks
↓
Generate grounded answer
↓
Show citations and source cards
- Main Repository: Trustworthy Domain-Specific RAG Chatbot With Provenance
- Backend: FastAPI, ChromaDB, document ingestion, retrieval, and grounded answer generation
- Frontend: Streamlit interface for upload, chat, citations, and fault-code lookup
- Documentation: README, contributing guide, demo flow, evaluation plan, and team plan
- PM: Project coordination, GitHub workflow, documentation, final integration, and demo readiness
- TM-1: Backend/API development and testing
- TM-2: RAG pipeline, retrieval, embeddings, provenance, and fault-code lookup
- TM-3: Frontend, evaluation, demo flow, and UI testing
This project prioritizes trustworthy, document-grounded answers with visible provenance. If the uploaded documents do not contain enough evidence, the assistant should refuse to guess.
Last updated: May 23, 2026