A FastAPI-powered backend with Streamlit frontend that orchestrates multiple LLM providers (Google Gemini & Ollama) for creative content generation. Built with LangChain for seamless model switching and LangServe for production-ready REST endpoints.
Backend: FastAPI server exposing LangChain chains as REST endpoints via LangServe
Frontend: Streamlit client with tabbed UI for essay and poem generation
Models:
- Gemini 2.5 Flash (via Google GenAI API) → Essay generation
- Llama 3.2 (via Ollama local runtime) → Poem generation
llama-gemini-chat/
├── api/
│ ├── app.py # FastAPI server with LangServe routes
│ └── client.py # Streamlit UI client
├── requirements.txt
├── .env # API keys (not committed)
└── .gitignore
|
Main Feed
|
LLM Response
|
Initializes three LangServe routes:
/gemini- Direct Gemini model access (generic chat)/essay- Prompt-templated essay generation via Gemini/poem- Prompt-templated poem generation via Ollama Llama
Key Implementation Details:
- Uses
ChatPromptTemplatefor structured prompt engineering - Chains prompts with models using LangChain's
|operator - Loads
GOOGLE_API_KEYfrom environment with validation - API Runs on
127.0.0.1:8000by default
Dual-tab interface for hitting LangServe endpoints:
- Tab 1: Essay Generator →
POST /essay/invoke - Tab 2: Poem Generator →
POST /poem/invoke
Request Format:
{
"input": {
"topic": "your_topic_here"
}
}Response Handling:
- Parses
output.contentfor dict responses - Falls back to raw
outputfor string responses - 120s timeout for long-running LLM calls
-
Ollama Runtime (for local Llama inference)
# Install Ollama: https://ollama.ai ollama pull llama3.2 or anyother model -
Google API Key (for Gemini access)
- Get key from Google AI Studio
-
Clone & Install Dependencies
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure Environment
# Create .env file GOOGLE_API_KEY=your_actual_google_api_key LANGCHAIN_API_KEY=your_langchain_key # Optional for tracing LANGCHAIN_PROJECT_NAME=ChatBotProject
-
Verify Ollama
ollama list # Should show llama3.2
python api/app.py
# Server runs on http://127.0.0.1:8000
# Docs at http://127.0.0.1:8000/docsstreamlit run api/client.py
# Opens browser at http://localhost:8501# Generate essay
curl -X POST http://localhost:8000/essay/invoke \
-H "Content-Type: application/json" \
-d '{"input": {"topic": "quantum computing"}}'
# Generate poem
curl -X POST http://localhost:8000/poem/invoke \
-H "Content-Type: application/json" \
-d '{"input": {"topic": "outer space"}}'| Package | Purpose |
|---|---|
fastapi + uvicorn |
ASGI server framework |
langchain_google_genai |
Gemini model integration |
langchain_ollama |
Ollama local model wrapper |
langserve |
Exposes LangChain chains as REST APIs |
streamlit |
Frontend UI builder |
python-dotenv |
Environment variable management |
- Prompt Templates: Hardcoded to 100-word outputs for consistency
- Model Selection: Essays use Gemini (cloud), poems use Llama (local) to demonstrate hybrid deployment
- Error Handling: Client includes 120s timeout + structured exception handling
- LangServe Routes: Auto-generates OpenAPI schemas at
/docsfor all chains
Ollama Connection Errors:
# Ensure Ollama service is running
ollama serve # If not running as daemonGemini API Errors:
- Verify
GOOGLE_API_KEYin.envis valid - Check quota at Google Cloud Console
Port Conflicts:
# Change FastAPI port in app.py
uvicorn.run(app, host="127.0.0.1", port=8001)
