English | 简体中文 | 繁體中文 | 日本語 | 한국어 | Español | Français | Deutsch
Lightweight, privacy-first voice dictation that types into any Mac app, with no time limit.
AtomVoice makes cloud use explicit. Sherpa-ONNX is fully offline, Apple Speech can be forced on-device when the current language supports it, and Doubao Cloud ASR / LLM Refinement are opt-in. AtomVoice itself does not run a server, keep recordings, or store transcript history.
The app bundle is under 3 MB and runs no background daemons. Sherpa-ONNX local recognition can use the CoreML backend to reduce CPU load and power usage; Sherpa runtime, ASR models, and punctuation models are downloaded on demand, and large local models can be released after idle time or critical memory pressure.
AtomVoice does not take over your system input method or change how you type. Keep your existing Chinese/English IMEs, shortcuts, and preferences; type when you want, speak when you want. For long sentences, quick additions, or cross-app writing, voice input can pick up wherever typing leaves off.
- Hold-to-talk or tap-to-talk — your choice, with optional ASR-text-driven silence auto-stop to reduce false cutoffs for quiet speech or noisy rooms
- Customizable trigger key — pick whichever modifier fits your keyboard
- In-recording shortcuts — cancel the take, inject immediately and skip LLM polish, or end with a punctuation in one keypress
- Headphone Voice Control (Beta) — use the headphone play/pause button for voice input; verified with USB headset / DAC controls and 3.5 mm headset buttons: single press follows your input mode, long press talks, double press sends Return
- Auto-cancel on app switch (hold mode only)
- Apple Speech Recognition — system engine with streaming, optional on-device mode, and segmented rolling that breaks the 1-minute SFSpeechRecognizer limit
- Sherpa-ONNX — fully offline local engine with language-specific model presets, CPU/CoreML backends, on-demand runtime/model/punctuation downloads, auto-unload, and third-party model import
- Doubao Cloud ASR — optional Volcengine streaming recognition with API Key stored in Keychain, ITN, smart punctuation, text smoothing, optional final-pass recognition, and Apple Speech fallback on cloud failures
- 8 app and recognition languages — English, 简体中文, 繁體中文, 日本語, 한국어, Español, Français, Deutsch; Sherpa also supports imported third-party models for languages without built-in presets
- Apple Live Insertion — completed sentences are injected during recording, no need to wait for release
- Smart punctuation — local heuristic punctuator (per-language); skipped automatically when the cursor is already followed by punctuation
- CJK IME compatible — temporarily switches to ASCII layout before paste, restores after
- LLM Refinement (Beta) — text-only post-processing with OpenAI-compatible and Anthropic APIs, streaming preview, 10 preset providers + fully editable custom list, multilingual default system prompt or your own
- 5-band FFT spectrum waveform tuned for the human voice (100–4200 Hz), driven by Accelerate
- Three animation styles — Dynamic Island (Spotlight-style spring + Gaussian blur), Minimal, None — three speeds, ProMotion 120 Hz aware
- Liquid Glass on macOS 26, Visual Effect blur on macOS 14/15
- 8 UI languages, auto-detected from system
- First-run setup guides permissions, input mode, and recognition-engine choice
- Auto update from GitHub Releases with SHA256 and code-signature verification (optional Beta channel)
- Launch at login (SMAppService)
- Audio input device picker — choose any system microphone
- Audio route resilience — recording can recover when headphones, AirPods, or input devices change mid-session; audio is resampled per recognition engine
- Lower system volume while recording (optional)
- Single-instance protection — old instance is terminated automatically on launch, reducing multiple app copies competing for headphone button events
- macOS 14 Sonoma or later
- Permissions: Accessibility, Microphone, Speech Recognition
From Release (recommended)
Download from Releases, unzip, drag to Applications. Three architectures are published per release: Universal / Apple Silicon / Intel.
Homebrew
brew tap BlackSquarre/tap
brew install --cask atomvoiceBuild from source
git clone https://github.com/BlackSquarre/AtomVoice.git
cd AtomVoice
make dev
open dist/Test/AtomVoice.appFor architecture and localization coverage checks, run:
make test
make lint-locmake lint-loc scans loc("key") usage against the 8 localization directories to catch missing UI translations.
The Makefile bundles and signs the app with the Apple Development identity configured in Makefile; change that identity if you build on another Mac.
Not notarized. On first open:
- Right-click
AtomVoice.app→ Open → click Open - Or go to System Settings → Privacy & Security → Open Anyway
- Or run:
xattr -cr /Applications/AtomVoice.app
| Action | Result |
|---|---|
| Hold trigger key | Start recording (hold mode) |
| Release trigger key | Stop and inject text |
| Tap trigger key | Start / stop recording (tap mode) |
| Click capsule while recording | Stop and inject text |
ESC while recording |
Cancel, no text injected |
Space / Backspace while recording |
Inject immediately, skip LLM |
| Type punctuation while recording | Inject + append that punctuation |
| Headphone play/pause button (optional, Beta) | Single press follows input mode, long press records, double press sends Return |
| Menu bar icon | Switch engine / language / input mode / animation / ASR settings / LLM |
- Apple Speech works out of the box. Enable on-device recognition when the selected language supports it.
- Sherpa-ONNX can be configured in Recognition Engine Settings → Sherpa Local. Choose language, model preset, CPU/CoreML provider, auto-unload delay, or import a third-party model package.
- Doubao Cloud ASR can be configured in Recognition Engine Settings → Doubao Cloud ASR. Enter your Volcengine API Key, choose the model version, and keep or edit the WebSocket endpoint. The first switch to Doubao asks for cloud-audio confirmation.
Menu bar → LLM Refinement (Beta) → Settings — pick a provider preset or add your own, enter API key and model name. Streaming output is previewed live in the capsule.
Built-in presets: OpenAI / Anthropic / DeepSeek / Moonshot (Kimi) / Qwen / GLM / Yi / Groq / Ollama (local) / Custom.
The default system prompt is tuned for dictation polish (fix homophones, mis-transcribed product/API names, fillers, punctuation) and switches automatically by recognition language. You can override it with your own prompt. LLM Refinement sends recognized text, not audio, to the provider you select.
Contributions welcome — see CONTRIBUTING.md for build, test, and PR guidelines.
For security reports, see SECURITY.md.
Apache 2.0 — see LICENSE for details.
Third-party notices (Sherpa-ONNX / ONNX Runtime) are in THIRD_PARTY_NOTICES.md.
