Free LLM Gateway

A unified OpenAI-compatible API server that aggregates 24+ free LLM providers into one endpoint. Configure your free API keys in .env, then use one base URL + one master key to access every model.

العربية | English

Features

Core

Single endpoint — http://localhost:8080/v1 (OpenAI SDK-compatible)
24+ providers — OpenRouter, GitHub Models, Groq, Cerebras, Cloudflare, HuggingFace, NVIDIA, SiliconFlow, Cohere, Google Gemini, Mistral, Kilo, LLM7, Ollama Cloud, DeepSeek, Together AI, Fireworks AI, SambaNova, Chutes, Anthropic, OpenAI, Perplexity, xAI, Novita AI, Z AI, ModelScope
260+ free models — auto-discovered from all providers
Automatic fallback — if one provider fails (rate limit, error, timeout), tries the next
Round-robin load balancing — distributes requests across providers
Streaming support — full SSE passthrough with mid-stream error handling
Smart routing — 60+ model aliases (type "gpt-4" → best available model)
Batch requests — fan out multiple requests in parallel

Security & Access Control

AES-256-GCM encrypted key storage — API keys encrypted at rest with authenticated encryption
Unified gateway API keys — fgk-... prefixed keys with per-key model/provider restrictions and admin roles
Per-key rate tracking — RPM/RPD/TPM/TPD monitoring with rolling time windows and provider free-tier limits
Token estimation — pre-flight TPM/TPD checks before routing to avoid hitting limits
Key health validation — one-click test all API keys
Timing-safe key comparison — constant-time auth to prevent timing attacks

Routing & Reliability

Sticky sessions — 30-minute provider affinity for conversation continuity
Routing headers — X-Routed-Via, X-Fallback-Attempts, X-Sticky-Session on every response
Tool/function calling translation — automatic OpenAI ↔ Gemini functionDeclarations conversion
Dynamic penalty routing — providers returning 429s sink in priority; penalties decay over time
Per-key cooldown — rate-limited keys are temporarily skipped until cooldown expires
Retry with backoff — exponential backoff on 500/502/503 errors, Retry-After support on 429s
Runtime fallback editing — reorder, enable/disable fallbacks via API without restart
Sort presets — sort fallbacks by priority, penalty score, or health status
Model size labels — automatic small/medium/large/xl tagging from model names

Analytics & Persistence

SQLite request log — every routed request persisted to disk, survives restarts
Rich analytics API — 5 dedicated endpoints: summary, by-model, by-provider, timeline, errors
Time range filtering — query analytics for 24h, 7d, or 30d windows
Error categorization — rate-limited, timeout, auth, server errors auto-categorized
Estimated cost savings — GPT-4o pricing comparison ($3/M input, $15/M output)
Usage analytics — usage tracking, token counts, provider success rates

Dashboard

Web dashboard — 18 tabs: Models, Providers, Usage, Analytics, Rich Analytics, Benchmarks, Cache, Combos, Quotas, OAuth, Setup, Keys, Logs, Playground, Rate Tracking, Sessions, Gateway Keys, Fallbacks
Interactive playground — test models directly from the dashboard with real-time responses
Fallback editor — visually reorder and toggle fallback chains from the dashboard
Auto-sync — pulls new free models from awesome-free-llm-apis
Docker support — one command to deploy

Quick Start

# 1. Clone
git clone https://github.com/MrFadiAi/free-llm-gateway.git
cd free-llm-gateway

# 2. Install dependencies
pip install -r requirements.txt

# 3. Configure API keys
cp .env.example .env
# Edit .env — add at least one provider API key

# 4. Start the gateway
python main.py

Open http://127.0.0.1:8080/ for the dashboard.

Configuration

Environment Variables (`.env`)

Variable	Description	Get Free Key
`MASTER_KEY`	Your gateway API key	Set anything you want
`OPENROUTER_KEY`	35+ free models	openrouter.ai ↗
`NVIDIA_KEY`	100+ models, no daily cap	build.nvidia.com ↗
`GITHUB_KEY`	OpenAI, Meta, Mistral models	GitHub Models ↗
`GROQ_KEY`	Ultra-fast inference	console.groq.com ↗
`CEREBRAS_KEY`	Fastest Llama inference (~2,600 tok/s)	cloud.cerebras.ai ↗
`GOOGLE_GEMINI_KEY`	Gemini 2.5 Flash, 1M context	aistudio.google.com ↗
`MISTRAL_KEY`	Mistral Small/Large/Codestral	console.mistral.ai ↗
`COHERE_KEY`	Command R+ (1K calls/month)	dashboard.cohere.com ↗
`SILICONFLOW_KEY`	Qwen, DeepSeek, GLM	siliconflow.cn ↗
`HUGGINGFACE_KEY`	Thousands of community models	huggingface.co ↗
`CLOUDFLARE_KEY`	50+ models, 10K neurons/day	Cloudflare Workers AI ↗
`KILO_KEY`	Free model gateway	kilo.ai ↗
`LLM7_KEY`	No registration needed	llm7.io ↗
`DEEPSEEK_KEY`	DeepSeek-R1 reasoning models	platform.deepseek.com ↗
`TOGETHER_KEY`	100+ open-source models, $5 free credits	together.ai ↗
`FIREWORKS_KEY`	Fast open-source inference	fireworks.ai ↗
`SAMBANOVA_KEY`	Fast Llama inference	sambanova.ai ↗
`CHUTES_KEY`	Community-driven model hosting	chutes.ai ↗
`NVIDIA_NIM_KEY`	100+ models via NVIDIA NIM	build.nvidia.com ↗
`NOVITA_KEY`	GPU cloud for open-source LLMs	novita.ai ↗
`Z_AI_KEY`	Zhipu AI permanent free models	bigmodel.cn ↗
`MODELSCOPE_KEY`	Alibaba ModelScope models	modelscope.cn ↗
`ANTHROPIC_KEY`	Claude models (paid)	anthropic.com ↗
`OPENAI_KEY`	GPT-4, GPT-4o (paid)	platform.openai.com ↗
`PERPLEXITY_KEY`	Sonar search-augmented models (paid)	perplexity.ai ↗
`XAI_KEY`	Grok models (paid)	x.ai ↗

You only need at least one provider key to get started.

Model Configuration (`models.yaml`)

Models are defined with ordered fallback chains:

models:
  llama-3.3-70b:
    - provider: openrouter
      model: meta-llama/llama-3.3-70b-instruct:free
    - provider: nvidia
      model: meta/llama-3.1-405b-instruct

Usage

With OpenAI SDK (Python)

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="your-master-key",
)

response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

With curl

# List available models
curl http://localhost:8080/v1/models \
  -H "Authorization: Bearer your-master-key"

# Chat completion
curl http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer your-master-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "llama-3.3-70b", "messages": [{"role": "user", "content": "Hello!"}]}'

With Gateway API Keys

Create a gateway key for per-user access control:

# Create a gateway key via API
curl -X POST http://localhost:8080/api/gateway-keys \
  -H "Authorization: Bearer your-master-key" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-user", "is_admin": false}'

# Response: {"raw_key": "fgk-xxxx...", ...}
# Use the fgk- key as your API key

# Use gateway key in any OpenAI-compatible tool
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="fgk-your-gateway-key-here",
)

With any OpenAI-compatible tool

Point any tool that supports custom OpenAI base URLs to http://localhost:8080/v1 with your master key as the API key. Works with Cursor, LibreChat, Open WebUI, OpenClaw, and more.

API Endpoints

Chat & Models

Endpoint	Method	Description
`/v1/chat/completions`	POST	Chat completions (with streaming)
`/v1/models`	GET	List all available models
`/v1/batch`	POST	Batch requests (parallel)
`/v1/embeddings`	POST	Text embeddings
`/`	GET	Web dashboard

Gateway & Analytics

Endpoint	Method	Description
`/api/ping`	GET	Lightweight liveness probe (no auth required)
`/api/status`	GET	JSON status
`/api/analytics`	GET	Usage analytics + savings
`/api/keys/validate-all`	POST	Validate all API keys
`/api/auto-update`	GET	Re-scan providers for new models
`/api/sync-providers`	POST	Sync from awesome-free-llm-apis
`/api/playground`	POST	Interactive model testing
`/api/translate-tools`	POST	Tool format translation

Rich Analytics (SQLite-backed)

Endpoint	Method	Description
`/api/analytics/summary?range=7d`	GET	Total requests, success rate, tokens, latency, savings
`/api/analytics/by-model?range=7d`	GET	Stats grouped by provider+model
`/api/analytics/by-provider?range=7d`	GET	Stats grouped by provider
`/api/analytics/timeline?range=7d&interval=day`	GET	Time-bucketed request counts
`/api/analytics/errors?range=7d`	GET	Error distribution by category + recent errors

Ranges: 24h, 7d, 30d

Fallback Management

Endpoint	Method	Description
`/api/fallbacks`	GET	Get all fallback chains with penalty info
`/api/fallbacks/{model}`	PUT	Reorder or toggle fallbacks for a model
`/api/fallbacks/{model}/sort/{preset}`	POST	Sort by preset: `priority`, `penalty`, `health`

Rate Tracking

Endpoint	Method	Description
`/api/rate-tracking`	GET	All per-key rate tracking data
`/api/rate-tracking/{provider}`	GET	Rate tracking for specific provider
`/api/rate-tracking/set-limits`	POST	Set custom rate limits
`/api/rate-tracking/cleanup`	POST	Clean up stale tracking data

Sessions

Endpoint	Method	Description
`/api/sessions`	GET	List active sticky sessions
`/api/sessions/{id}`	DELETE	Remove a session
`/api/sessions/cleanup`	POST	Clean up expired sessions

Gateway Keys

Endpoint	Method	Description
`/api/gateway-keys`	GET	List gateway API keys
`/api/gateway-keys`	POST	Create a new gateway key
`/api/gateway-keys/{name}`	DELETE	Revoke a gateway key
`/api/gateway-keys/{name}/toggle`	PUT	Enable/disable a key

Encrypted Key Store

Endpoint	Method	Description
`/api/encrypted-keys`	GET	List encrypted keys
`/api/encrypted-keys`	POST	Add encrypted key
`/api/encrypted-keys/{provider}/{index}`	DELETE	Remove encrypted key

Auto-Updates

Keep models fresh with zero effort:

Dashboard → Setup tab → "Sync Providers" button
Terminal → python3 sync_providers.py
Auto-cron → weekly sync from awesome-free-llm-apis

New providers and models appear automatically.

Docker

docker-compose up -d

Architecture

Any AI Tool → Gateway (localhost:8080)
               ├── Auth check (timing-safe, MASTER_KEY or fgk- gateway keys)
               ├── Per-key rate limiting (RPM/RPD/TPM/TPD)
               ├── Token estimation with TPM/TPD pre-checks
               ├── AES-256-GCM encrypted key storage
               ├── Smart routing with 60+ aliases
               ├── Model size labels (auto-inferred from names)
               ├── Round-robin load balancing
               ├── Dynamic penalty routing (429s sink priority)
               ├── Per-key cooldown on rate limits
               ├── Retry with exponential backoff (500/502/503)
               ├── Sticky sessions (30-min affinity)
               ├── Tool calling translation (OpenAI ↔ Gemini)
               ├── Streaming with mid-stream error handling
               ├── Liveness probe (/api/ping health checks)
               ├── Rate limit tracking per provider
               ├── Auto-fallback on failure
               ├── Runtime fallback chain editing (API)
               ├── Response caching (LRU + TTL)
               ├── Request queuing with backoff
               ├── SQLite persistent request log
               ├── Rich analytics API (5 endpoints)
               ├── Routing headers (X-Routed-Via, X-Fallback-Attempts)
               └── Usage analytics + savings tracker

Testing

pip install pytest
python -m pytest tests/ -v

130 tests covering rate tracking, sticky sessions, gateway auth, tool translation, key encryption, health checks, SQLite request logging, safe streaming, fallback editing, analytics endpoints, and more.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.claude/worktrees/quirky-lederberg-a4be28		.claude/worktrees/quirky-lederberg-a4be28
data		data
docs		docs
static		static
templates		templates
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
BENCHMARK_DATA.md		BENCHMARK_DATA.md
Dockerfile		Dockerfile
PROVIDER_TEST_RESULTS.md		PROVIDER_TEST_RESULTS.md
README.md		README.md
README_AR.md		README_AR.md
TEST_REPORT.md		TEST_REPORT.md
auto_update.py		auto_update.py
benchmark.py		benchmark.py
cache.py		cache.py
config.py		config.py
custom_combos.py		custom_combos.py
docker-compose.yml		docker-compose.yml
format_translator.py		format_translator.py
gateway_auth.py		gateway_auth.py
health.py		health.py
key_encryptor.py		key_encryptor.py
key_manager.py		key_manager.py
main.py		main.py
models.yaml		models.yaml
oauth_manager.py		oauth_manager.py
provider_guides.py		provider_guides.py
providers.py		providers.py
quota_tracker.py		quota_tracker.py
rate_limiter.py		rate_limiter.py
rate_tracker.py		rate_tracker.py
request_db.py		request_db.py
request_queue.py		request_queue.py
requirements.txt		requirements.txt
router.py		router.py
smart_default.py		smart_default.py
smart_router.py		smart_router.py
sticky_sessions.py		sticky_sessions.py
sync_providers.py		sync_providers.py
token_compressor.py		token_compressor.py
token_estimator.py		token_estimator.py
tool_call_translator.py		tool_call_translator.py
tracking.py		tracking.py

Folders and files

Latest commit

History

Repository files navigation

Free LLM Gateway

Features

Core

Security & Access Control

Routing & Reliability

Analytics & Persistence

Dashboard

Quick Start

Configuration

Environment Variables (.env)

Model Configuration (models.yaml)

Usage

With OpenAI SDK (Python)

With curl

With Gateway API Keys

With any OpenAI-compatible tool

API Endpoints

Chat & Models

Gateway & Analytics

Rich Analytics (SQLite-backed)

Fallback Management

Rate Tracking

Sessions

Gateway Keys

Encrypted Key Store

Auto-Updates

Docker

Architecture

Testing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment Variables (`.env`)

Model Configuration (`models.yaml`)

Packages