AI RAG Telegram Chatbot

An end-to-end Retrieval-Augmented Generation (RAG) Telegram chatbot built with Python. It allows you to ingest and preprocess local documentation, embed it into a Chroma vector database, and chat with an LLM that can answer questions grounded in your private knowledge base.

✨ Features

RAG Pipeline: embeds your docs with sentence-transformers and stores them in Chroma for low-latency semantic search.
Telegram Bot Interface: talk to the assistant directly from Telegram (telegram_bot.py).
CLI Utilities: preprocess, chunk, embed, and query your docs from the command line (cli/, preprocessing/).
Markdown-Aware Responses: bot returns Telegram-flavoured Markdown via tg_bot/utils/markdown.py.
Modular Core: all retrieval / generation logic lives in core/ for easy reuse.

🗂️ Project Layout

├── cli/                # Command-line entry points
├── core/               # RAG core (retrieval, generation)
├── preprocessing/      # Cleaning, chunking, embedding scripts
├── tg_bot/             # Telegram flows, callbacks, utilities
├── utils/              # Generic helpers
├── telegram_bot.py     # Main bot launcher
├── requirements.txt    # Python deps
└── env.example         # Sample environment variables

🚀 Quickstart

Clone & install

git clone <your-fork-url> ai_rag_telegram_bot
cd ai_rag_telegram_bot
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

Configure environment Copy .env.example ➜ .env and fill in:
- TELEGRAM_BOT_TOKEN – from BotFather
- OPENAI_API_KEY – or another LLM provider key
- (optional) CHROMA_DB_DIR – where to store embeddings

Ingest documentation

# 1️⃣ Clean raw txt / md files
python -m preprocessing.clean_txt path/to/raw_docs/ cleaned/

# 2️⃣ Split into semantic chunks
python -m preprocessing.chunk_text cleaned/ chunks/

# 3️⃣ Embed & store in Chroma
python -m preprocessing.embed_chunks_chroma chunks/

Run the bot
```
python telegram_bot.py
```
Talk to your bot on Telegram 🎉.

🛠️ CLI Cheatsheet

Command	Purpose
`python -m cli.rag_cli query "question"`	Quick query against the KB
`python -m preprocessing.explore_chroma_db`	Inspect stored embeddings

🧩 How It Works

Retrieval: user query ➜ rag_core.retrieve() finds top-k similar chunks in Chroma.
Generation: selected contexts + user query ➜ LLM via utils/llm_utils.py.
Response: answer streamed back to Telegram with references.

All heavy lifting is decoupled so you can swap the vector DB or LLM provider easily.

📄 Environment Variables

See env.example for the full list. The most important ones:

TELEGRAM_BOT_TOKEN
OPENAI_API_KEY
CHROMA_DB_DIR (default: ./chroma_db)

🧪 Testing

No formal test suite yet. PRs adding pytest coverage are welcome!

🤝 Contributing

Fork → create feature branch (git checkout -b feature/XYZ).
Commit changes (git commit -m 'feat: XYZ').
Push & open a PR describing your changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI RAG Telegram Chatbot

✨ Features

🗂️ Project Layout

🚀 Quickstart

🛠️ CLI Cheatsheet

🧩 How It Works

📄 Environment Variables

🧪 Testing

🤝 Contributing

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cli		cli
core		core
preprocessing		preprocessing
tg_bot		tg_bot
utils		utils
.gitignore		.gitignore
README.md		README.md
env.example		env.example
requirements.txt		requirements.txt
telegram_bot.py		telegram_bot.py

Folders and files

Latest commit

History

Repository files navigation

AI RAG Telegram Chatbot

✨ Features

🗂️ Project Layout

🚀 Quickstart

🛠️ CLI Cheatsheet

🧩 How It Works

📄 Environment Variables

🧪 Testing

🤝 Contributing

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages