Talk to your books. Turn PDFs into real-time AI voice conversations.
Cognibook is an AI-powered full-stack web application that transforms static PDFs into interactive, conversational experiences. Upload a book and interact with it using real-time voice powered by AI.
-
📄 PDF Upload & Processing Upload books and automatically extract and segment text.
-
🧠 Retrieval-Augmented Generation (RAG) Fetches relevant book content and uses AI to generate accurate responses.
-
🎙️ Real-Time Voice Conversations Talk to your books using natural voice synthesis powered by Vapi.
-
📚 Personal Library Manage your uploaded books with authentication.
-
⏱️ Session Tracking & Limits Track conversation duration with subscription-based limits.
-
🔐 Authentication with Clerk Secure login and user-based data isolation.
-
☁️ Cloud Storage Files stored using Vercel Blob.
| Plan | Features |
|---|---|
| 🆓 Free | Limited books, short voice sessions |
| ⭐ Standard | Increased limits, longer sessions |
| 🚀 Pro | Full access, extended sessions |
User uploads PDF
↓
File stored in Vercel Blob
↓
Text extracted & split into segments
↓
Stored in MongoDB
↓
User asks a question (voice)
↓
Relevant segments retrieved (RAG)
↓
AI generates response
↓
Voice response played to user ┌──────────────────────┐
│ Frontend UI │
│ (Next.js / React) │
└─────────┬────────────┘
│
▼
┌──────────────────────────────────┐
│ Next.js Server (API Routes) │
│ - Upload Handling │
│ - Book Processing │
│ - Vapi Integration │
└─────────┬──────────┬────────────┘
│ │
│ │
▼ ▼
┌────────────────┐ ┌────────────────────┐
│ MongoDB Atlas │ │ Vercel Blob │
│ (Book Data + │ │ (PDF + Images) │
│ Segments) │ └────────────────────┘
└────────────────┘
│
▼
┌──────────────────────────┐
│ Vapi + LLM (AI Layer) │
│ - Speech-to-Text │
│ - Response Generation │
│ - Text-to-Speech │
└──────────────────────────┘
│
▼
🎙️ Real-time Voice Output
🔐 Clerk (Auth + Billing)
- User authentication
- Subscription management
- Feature gating- Problem: AI cannot process entire PDFs at once
- Solution: Split documents into segments and use MongoDB text indexing to retrieve only relevant chunks
- Problem: Maintaining smooth conversational flow
- Solution: Built a state machine (idle → listening → thinking → speaking) using a custom
useVapihook
- Problem: Enforcing usage limits per user
- Solution: Implemented duration tracking + max session limits with automatic session termination
-
Problem: Restricting features based on user plan
-
Solution: Integrated Clerk Billing to dynamically control:
- Number of books
- Session duration
- Feature access
-
Problem: Avoiding AI hallucinations
-
Solution: Built a Retrieval-Augmented Generation pipeline:
- Retrieve relevant segments
- Feed context to AI
- Generate accurate responses
- Problem: Storing large PDFs efficiently
- Solution: Used Vercel Blob for scalable file storage and CDN delivery
- Frontend: Next.js 16, React, Tailwind CSS
- Backend: Next.js Server Actions & API Routes
- Database: MongoDB (Atlas)
- Authentication: Clerk
- AI Voice: Vapi + ElevenLabs
- Storage: Vercel Blob
- State Management: Custom React Hooks
app/
├── (root)/
├── api/
│ ├── upload/
│ └── vapi/
├── books/
components/
hooks/
lib/
database/MONGODB_URI=
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=
CLERK_SECRET_KEY=
BLOB_READ_WRITE_TOKEN=
NEXT_PUBLIC_VAPI_API_KEY=
NEXT_PUBLIC_ASSISTANT_ID=git clone https://github.com/sagnikbose-11-01/Cognibook.git
cd cognibook
npm install
npm run devContributions are welcome! Feel free to fork the repo and submit a PR.
Built with ❤️ by Sagnik Bose