Skip to content

danielrosehill/Gemini-Transcription-MCP

Repository files navigation

Gemini Transcription MCP

An MCP server for audio-to-text transcription using Google's Gemini multimodal API.

npm version

Quick Start

Claude Code (Recommended)

claude mcp add gemini-transcription -s user \
  -e OPENROUTER_API_KEY=your-key \
  -- npx -y gemini-transcription-mcp

Claude Desktop

Add to your claude_desktop_config.json:

{
  "mcpServers": {
    "gemini-transcription": {
      "command": "npx",
      "args": ["-y", "gemini-transcription-mcp"],
      "env": {
        "OPENROUTER_API_KEY": "your-key"
      }
    }
  }
}

MetaMCP

Add via the MetaMCP UI or import JSON:

{
  "mcpServers": {
    "gemini-transcription": {
      "command": "npx",
      "args": ["-y", "gemini-transcription-mcp"],
      "env": {
        "OPENROUTER_API_KEY": "your-key"
      },
      "description": "Audio transcription using Gemini models via OpenRouter"
    }
  }
}

Or fill in the Add Server form manually:

Field Value
Command npx
Arguments -y gemini-transcription-mcp
Environment Variables OPENROUTER_API_KEY=your-key

Remote Deployment (HTTP Transport)

For deployments that require HTTP transport:

# Using Docker (recommended for remote)
docker run -d \
  -p 3000:3000 \
  -e OPENROUTER_API_KEY=your-key \
  ghcr.io/danielrosehill/gemini-transcription-mcp

# Or run directly with HTTP transport
OPENROUTER_API_KEY=your-key npx gemini-transcription-mcp --http 3000

The server exposes:

  • http://host:3000/mcp - MCP endpoint (streamable HTTP)
  • http://host:3000/health - Health check

Tools

Tool Description
transcribe_audio Lightly edited transcript (removes filler words, applies corrections)
transcribe_audio_raw Verbatim transcript with no cleanup
transcribe_audio_vad VAD preprocessing to strip silence before transcription
transcribe_audio_format Transcribe and format as a document type (email, to-do list, etc.)
transcribe_audio_large Compresses oversized files to Opus before transcribing
transcribe_audio_custom Full control with your own prompt
transcribe_audio_devspec Format as a development specification for AI coding agents

Input Methods

All tools accept audio via:

  • file_content: Base64-encoded audio
  • file_url: HTTP(S) URL to fetch
  • ssh_host + ssh_path: Pull via SCP (local deployment only)

Supported Formats

  • Native: MP3, WAV, OGG, FLAC, AAC, AIFF
  • Auto-converted: Opus, M4A, WebM, WMA, and others (converted to OGG/Opus)

Note: When manually converting audio, prefer MP3 over WAV. MP3 offers good compression with broad compatibility, while WAV files are unnecessarily large.

Configuration

Environment Variable Description
OPENROUTER_API_KEY Required. Your OpenRouter API key
OPENROUTER_MODEL Optional. Model to use (default: Gemini Flash Lite)
TRANSCRIPT_OUTPUT_DIR Optional. Auto-save location (default: ./transcripts). Set to empty string to disable.
MCP_TRANSPORT Optional. Set to http for HTTP transport mode
MCP_PORT Optional. Port for HTTP mode (default: 3000)

Deployment Options

Local (Claude Code, Claude Desktop)

Uses stdio transport. All features available including SSH file retrieval.

# Via npx (recommended)
npx gemini-transcription-mcp

# Or install globally
npm install -g gemini-transcription-mcp
gemini-transcription-mcp

Remote/Docker (MetaMCP, Aggregators)

Uses HTTP transport. Requires container or server with ffmpeg installed.

Docker Compose:

# docker-compose.yml
services:
  gemini-transcription:
    image: ghcr.io/danielrosehill/gemini-transcription-mcp
    ports:
      - "3000:3000"
    environment:
      - OPENROUTER_API_KEY=${OPENROUTER_API_KEY}
# Create .env file with your API key
echo "OPENROUTER_API_KEY=your-key" > .env

# Start the service
docker compose up -d

Feature Availability by Deployment Type

Feature Local (stdio) Remote (HTTP)
Base64 audio input Yes Yes
URL audio input Yes Yes
SSH file retrieval Yes No*
Transcript auto-save Yes Container volume
VAD preprocessing Yes Yes
Format conversion Yes Yes

* SSH retrieval requires local access to SSH keys and network.

Requirements

When using Docker, ffmpeg is included in the image.

Building from Source

git clone https://github.com/danielrosehill/Gemini-Transcription-MCP.git
cd Gemini-Transcription-MCP
npm install
npm run build

# Run locally
OPENROUTER_API_KEY=your-key npm start

# Run with HTTP transport
OPENROUTER_API_KEY=your-key MCP_TRANSPORT=http npm start

# Build Docker image
docker build -t gemini-transcription-mcp .

License

MIT

About

MCP for Gemini multimodal audio transcription with built in post-processing

Topics

Resources

License

Stars

Watchers

Forks

Contributors