Skip to content

sagnikbose-11-01/Cognibook

Repository files navigation

📚 Cognibook

Talk to your books. Turn PDFs into real-time AI voice conversations.

Cognibook is an AI-powered full-stack web application that transforms static PDFs into interactive, conversational experiences. Upload a book and interact with it using real-time voice powered by AI.


🚀 Features

  • 📄 PDF Upload & Processing Upload books and automatically extract and segment text.

  • 🧠 Retrieval-Augmented Generation (RAG) Fetches relevant book content and uses AI to generate accurate responses.

  • 🎙️ Real-Time Voice Conversations Talk to your books using natural voice synthesis powered by Vapi.

  • 📚 Personal Library Manage your uploaded books with authentication.

  • ⏱️ Session Tracking & Limits Track conversation duration with subscription-based limits.

  • 🔐 Authentication with Clerk Secure login and user-based data isolation.

  • ☁️ Cloud Storage Files stored using Vercel Blob.


💳 Pricing Plans

Plan Features
🆓 Free Limited books, short voice sessions
Standard Increased limits, longer sessions
🚀 Pro Full access, extended sessions

🧠 How It Works

User uploads PDF
   ↓
File stored in Vercel Blob
   ↓
Text extracted & split into segments
   ↓
Stored in MongoDB
   ↓
User asks a question (voice)
   ↓
Relevant segments retrieved (RAG)
   ↓
AI generates response
   ↓
Voice response played to user

🏗️ Architecture Diagram

                ┌──────────────────────┐
                │     Frontend UI      │
                │   (Next.js / React)  │
                └─────────┬────────────┘
                          │
                          ▼
        ┌──────────────────────────────────┐
        │   Next.js Server (API Routes)    │
        │  - Upload Handling               │
        │  - Book Processing              │
        │  - Vapi Integration             │
        └─────────┬──────────┬────────────┘
                  │          │
                  │          │
                  ▼          ▼
     ┌────────────────┐   ┌────────────────────┐
     │ MongoDB Atlas  │   │  Vercel Blob       │
     │ (Book Data +   │   │  (PDF + Images)    │
     │  Segments)     │   └────────────────────┘
     └────────────────┘
                  │
                  ▼
        ┌──────────────────────────┐
        │   Vapi + LLM (AI Layer)  │
        │ - Speech-to-Text         │
        │ - Response Generation    │
        │ - Text-to-Speech         │
        └──────────────────────────┘
                  │
                  ▼
        🎙️ Real-time Voice Output

        🔐 Clerk (Auth + Billing)
        - User authentication
        - Subscription management
        - Feature gating

🎯 Key Challenges Solved

1. 🧠 Efficient Retrieval for Large PDFs

  • Problem: AI cannot process entire PDFs at once
  • Solution: Split documents into segments and use MongoDB text indexing to retrieve only relevant chunks

2. ⚡ Real-Time Voice Interaction

  • Problem: Maintaining smooth conversational flow
  • Solution: Built a state machine (idle → listening → thinking → speaking) using a custom useVapi hook

3. ⏱️ Session Timing & Limits

  • Problem: Enforcing usage limits per user
  • Solution: Implemented duration tracking + max session limits with automatic session termination

4. 💳 Subscription-Based Feature Gating

  • Problem: Restricting features based on user plan

  • Solution: Integrated Clerk Billing to dynamically control:

    • Number of books
    • Session duration
    • Feature access

5. 🔍 RAG Pipeline Integration

  • Problem: Avoiding AI hallucinations

  • Solution: Built a Retrieval-Augmented Generation pipeline:

    • Retrieve relevant segments
    • Feed context to AI
    • Generate accurate responses

6. ☁️ Scalable File Handling

  • Problem: Storing large PDFs efficiently
  • Solution: Used Vercel Blob for scalable file storage and CDN delivery

🏗️ Tech Stack

  • Frontend: Next.js 16, React, Tailwind CSS
  • Backend: Next.js Server Actions & API Routes
  • Database: MongoDB (Atlas)
  • Authentication: Clerk
  • AI Voice: Vapi + ElevenLabs
  • Storage: Vercel Blob
  • State Management: Custom React Hooks

📂 Project Structure

app/
  ├── (root)/
  ├── api/
  │    ├── upload/
  │    └── vapi/
  ├── books/
components/
hooks/
lib/
database/

⚙️ Environment Variables

MONGODB_URI=
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
BLOB_READ_WRITE_TOKEN=
NEXT_PUBLIC_VAPI_API_KEY=
NEXT_PUBLIC_ASSISTANT_ID=

🛠️ Installation

git clone https://github.com/sagnikbose-11-01/Cognibook.git
cd cognibook
npm install
npm run dev

🤝 Contributing

Contributions are welcome! Feel free to fork the repo and submit a PR.


👨‍💻 Author

Built with ❤️ by Sagnik Bose


About

Cognibook is a full-stack AI application built with Next.js, MongoDB, and Vapi that enables real-time voice conversations with uploaded PDFs using retrieval-augmented generation (RAG).

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors