An MCP server for audio-to-text transcription using Google's Gemini multimodal API.
claude mcp add gemini-transcription -s user \
-e OPENROUTER_API_KEY=your-key \
-- npx -y gemini-transcription-mcpAdd to your claude_desktop_config.json:
{
"mcpServers": {
"gemini-transcription": {
"command": "npx",
"args": ["-y", "gemini-transcription-mcp"],
"env": {
"OPENROUTER_API_KEY": "your-key"
}
}
}
}Add via the MetaMCP UI or import JSON:
{
"mcpServers": {
"gemini-transcription": {
"command": "npx",
"args": ["-y", "gemini-transcription-mcp"],
"env": {
"OPENROUTER_API_KEY": "your-key"
},
"description": "Audio transcription using Gemini models via OpenRouter"
}
}
}Or fill in the Add Server form manually:
| Field | Value |
|---|---|
| Command | npx |
| Arguments | -y gemini-transcription-mcp |
| Environment Variables | OPENROUTER_API_KEY=your-key |
For deployments that require HTTP transport:
# Using Docker (recommended for remote)
docker run -d \
-p 3000:3000 \
-e OPENROUTER_API_KEY=your-key \
ghcr.io/danielrosehill/gemini-transcription-mcp
# Or run directly with HTTP transport
OPENROUTER_API_KEY=your-key npx gemini-transcription-mcp --http 3000The server exposes:
http://host:3000/mcp- MCP endpoint (streamable HTTP)http://host:3000/health- Health check
| Tool | Description |
|---|---|
transcribe_audio |
Lightly edited transcript (removes filler words, applies corrections) |
transcribe_audio_raw |
Verbatim transcript with no cleanup |
transcribe_audio_vad |
VAD preprocessing to strip silence before transcription |
transcribe_audio_format |
Transcribe and format as a document type (email, to-do list, etc.) |
transcribe_audio_large |
Compresses oversized files to Opus before transcribing |
transcribe_audio_custom |
Full control with your own prompt |
transcribe_audio_devspec |
Format as a development specification for AI coding agents |
All tools accept audio via:
file_content: Base64-encoded audiofile_url: HTTP(S) URL to fetchssh_host+ssh_path: Pull via SCP (local deployment only)
- Native: MP3, WAV, OGG, FLAC, AAC, AIFF
- Auto-converted: Opus, M4A, WebM, WMA, and others (converted to OGG/Opus)
Note: When manually converting audio, prefer MP3 over WAV. MP3 offers good compression with broad compatibility, while WAV files are unnecessarily large.
| Environment Variable | Description |
|---|---|
OPENROUTER_API_KEY |
Required. Your OpenRouter API key |
OPENROUTER_MODEL |
Optional. Model to use (default: Gemini Flash Lite) |
TRANSCRIPT_OUTPUT_DIR |
Optional. Auto-save location (default: ./transcripts). Set to empty string to disable. |
MCP_TRANSPORT |
Optional. Set to http for HTTP transport mode |
MCP_PORT |
Optional. Port for HTTP mode (default: 3000) |
Uses stdio transport. All features available including SSH file retrieval.
# Via npx (recommended)
npx gemini-transcription-mcp
# Or install globally
npm install -g gemini-transcription-mcp
gemini-transcription-mcpUses HTTP transport. Requires container or server with ffmpeg installed.
Docker Compose:
# docker-compose.yml
services:
gemini-transcription:
image: ghcr.io/danielrosehill/gemini-transcription-mcp
ports:
- "3000:3000"
environment:
- OPENROUTER_API_KEY=${OPENROUTER_API_KEY}# Create .env file with your API key
echo "OPENROUTER_API_KEY=your-key" > .env
# Start the service
docker compose up -d| Feature | Local (stdio) | Remote (HTTP) |
|---|---|---|
| Base64 audio input | Yes | Yes |
| URL audio input | Yes | Yes |
| SSH file retrieval | Yes | No* |
| Transcript auto-save | Yes | Container volume |
| VAD preprocessing | Yes | Yes |
| Format conversion | Yes | Yes |
* SSH retrieval requires local access to SSH keys and network.
- Node.js 18+
- ffmpeg (for format conversion and VAD preprocessing)
- OpenRouter API key
When using Docker, ffmpeg is included in the image.
git clone https://github.com/danielrosehill/Gemini-Transcription-MCP.git
cd Gemini-Transcription-MCP
npm install
npm run build
# Run locally
OPENROUTER_API_KEY=your-key npm start
# Run with HTTP transport
OPENROUTER_API_KEY=your-key MCP_TRANSPORT=http npm start
# Build Docker image
docker build -t gemini-transcription-mcp .MIT