🐋 DeepSeek-R1 - Document RAG (Retrieval-Augmented Generation)

📌 Overview

This project implements a Retrieval-Augmented Generation (RAG) system for document-based question answering using:

DeepSeek-R1 (via Groq) for fast & efficient AI-powered responses.
ChromaDB for storing and retrieving document embeddings.
HuggingFace Embeddings for vectorization.
Streamlit for an interactive UI.

🔹 Users can upload a PDF, process it into a vector database, and query it for answers!

🚀 Features

📄 Upload and process PDF documents into a vector store.
🔍 Retrieve relevant document sections using ChromaDB.
💬 Generate AI-powered answers using DeepSeek-R1 via Groq.
🎛 Interactive UI built with Streamlit.

📂 Project Structure

/project-folder
│-- main.py                # Streamlit UI for user interaction
│-- rag_utility.py         # Core logic for document processing & retrieval
│-- config.json            # Stores API keys
│-- requirements.txt       # Required dependencies

🛠 Installation & Setup

1️⃣ Clone the Repository

git clone https://github.com/AymenGabsi/rag_deepseek_groq_doc_qa.git
cd your-repo

2️⃣ Set Up a Virtual Environment

python -m venv venv
source venv/bin/activate   # For macOS/Linux
venv\Scripts\activate     # For Windows

3️⃣ Install Dependencies

pip install --upgrade pip setuptools wheel
pip install -r requirements.txt

4️⃣ Configure API Keys

Open config.json and replace your_groq_api_key with your actual Groq API Key:

{
  "GROQ_API_KEY": "your_groq_api_key"
}

▶️ Running the Application

After setting up, launch the Streamlit app:

streamlit run main.py

This will open a web interface where you can:

Upload a PDF
Ask questions about its content
Receive AI-generated answers powered by DeepSeek-R1

🏗 How It Works

Upload a PDF file.
Extract text and convert it into vector embeddings using HuggingFace.
Store embeddings in ChromaDB for efficient retrieval.
Retrieve relevant sections from ChromaDB when a user asks a question.
Send retrieved data to DeepSeek-R1 (Groq) LLM for final answer generation.

📌 Technologies Used

LangChain for orchestrating LLM-based workflows.
DeepSeek-R1 via Groq for LLM-based response generation.
ChromaDB for vector storage and retrieval.
HuggingFace Embeddings for document vectorization.
Streamlit for UI interaction.

🤖 Future Enhancements

🔄 Support more document formats (TXT, DOCX, HTML, etc.)
📊 Improve retrieval accuracy using hybrid search (BM25 + embeddings).
🎨 Enhance UI with real-time answer explanations.
🚀 Optimize response speed by caching retrieval results.

📜 License

This project is licensed under the MIT License.

⭐ Contributing

Contributions are welcome! Feel free to fork this repository, submit issues, or create pull requests.

📧 For inquiries, reach out via GitHub Issues.

🚀 Happy Coding!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🐋 DeepSeek-R1 - Document RAG (Retrieval-Augmented Generation)

📌 Overview

🚀 Features

📂 Project Structure

🛠 Installation & Setup

1️⃣ Clone the Repository

2️⃣ Set Up a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Configure API Keys

▶️ Running the Application

🏗 How It Works

📌 Technologies Used

🤖 Future Enhancements

📜 License

⭐ Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
config.json		config.json
main.py		main.py
rag_utility.py		rag_utility.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🐋 DeepSeek-R1 - Document RAG (Retrieval-Augmented Generation)

📌 Overview

🚀 Features

📂 Project Structure

🛠 Installation & Setup

1️⃣ Clone the Repository

2️⃣ Set Up a Virtual Environment

3️⃣ Install Dependencies

4️⃣ Configure API Keys

▶️ Running the Application

🏗 How It Works

📌 Technologies Used

🤖 Future Enhancements

📜 License

⭐ Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages