Skip to content

edkovalski/ai_rag_telegram_bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI RAG Telegram Chatbot

An end-to-end Retrieval-Augmented Generation (RAG) Telegram chatbot built with Python. It allows you to ingest and preprocess local documentation, embed it into a Chroma vector database, and chat with an LLM that can answer questions grounded in your private knowledge base.


✨ Features

  • RAG Pipeline: embeds your docs with sentence-transformers and stores them in Chroma for low-latency semantic search.
  • Telegram Bot Interface: talk to the assistant directly from Telegram (telegram_bot.py).
  • CLI Utilities: preprocess, chunk, embed, and query your docs from the command line (cli/, preprocessing/).
  • Markdown-Aware Responses: bot returns Telegram-flavoured Markdown via tg_bot/utils/markdown.py.
  • Modular Core: all retrieval / generation logic lives in core/ for easy reuse.

🗂️ Project Layout

├── cli/                # Command-line entry points
├── core/               # RAG core (retrieval, generation)
├── preprocessing/      # Cleaning, chunking, embedding scripts
├── tg_bot/             # Telegram flows, callbacks, utilities
├── utils/              # Generic helpers
├── telegram_bot.py     # Main bot launcher
├── requirements.txt    # Python deps
└── env.example         # Sample environment variables

🚀 Quickstart

  1. Clone & install
    git clone <your-fork-url> ai_rag_telegram_bot
    cd ai_rag_telegram_bot
    python -m venv .venv && source .venv/bin/activate
    pip install -r requirements.txt
  2. Configure environment Copy .env.example.env and fill in:
    • TELEGRAM_BOT_TOKEN – from BotFather
    • OPENAI_API_KEY – or another LLM provider key
    • (optional) CHROMA_DB_DIR – where to store embeddings
  3. Ingest documentation
    # 1️⃣ Clean raw txt / md files
    python -m preprocessing.clean_txt path/to/raw_docs/ cleaned/
    
    # 2️⃣ Split into semantic chunks
    python -m preprocessing.chunk_text cleaned/ chunks/
    
    # 3️⃣ Embed & store in Chroma
    python -m preprocessing.embed_chunks_chroma chunks/
  4. Run the bot
    python telegram_bot.py
    Talk to your bot on Telegram 🎉.

🛠️ CLI Cheatsheet

Command Purpose
python -m cli.rag_cli query "question" Quick query against the KB
python -m preprocessing.explore_chroma_db Inspect stored embeddings

🧩 How It Works

  1. Retrieval: user query ➜ rag_core.retrieve() finds top-k similar chunks in Chroma.
  2. Generation: selected contexts + user query ➜ LLM via utils/llm_utils.py.
  3. Response: answer streamed back to Telegram with references.

All heavy lifting is decoupled so you can swap the vector DB or LLM provider easily.


📄 Environment Variables

See env.example for the full list. The most important ones:

  • TELEGRAM_BOT_TOKEN
  • OPENAI_API_KEY
  • CHROMA_DB_DIR (default: ./chroma_db)

🧪 Testing

No formal test suite yet. PRs adding pytest coverage are welcome!


🤝 Contributing

  1. Fork → create feature branch (git checkout -b feature/XYZ).
  2. Commit changes (git commit -m 'feat: XYZ').
  3. Push & open a PR describing your changes.

📜 License

MIT License © 2025 Your Name

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages