Skip to content

jedie/stt2desktop

Repository files navigation

stt2desktop

tests codecov stt2desktop @ PyPi Python Versions License GPL-3.0-or-later

Local speech-to-text for desktop using faster-whisper.

Let's you dictate text into any application without sending audio to any cloud services. Everything runs locally on your machine — no internet connection required after the initial model was download.

Currently only tested under Linux with KDE ;)

How it works

  1. Run ./cli.py listen (Whisper model downloaded on first run, cached on disk)
  2. Hold Scroll Lock to record from your microphone
  3. Release Scroll Lock — the audio is transcribed locally by faster-whisper
  4. The transcribed text is copied to the clipboard via wl-copy and pasted into the focused window via ydotool key ctrl+v

Used tools:

  • faster-whisper for local speech recognition
  • ydotool to simulate keyboard input (works on Wayland and X11)
  • wl-clipboard (wl-copy) to paste text via clipboard — avoids keyboard layout issues
  • chime to play notification sounds

Prepare installation

Requirements: Python 3.12+, a working microphone, wl-clipboard and ydotool and ydotoold:

sudo apt install ydotool ydotoold wl-clipboard
sudo usermod -aG input $USER
echo 'KERNEL=="uinput", GROUP="input", MODE="0660"' | sudo tee /etc/udev/rules.d/60-uinput.rules
sudo udevadm control --reload-rules && sudo udevadm trigger

Then re-login (or run newgrp input in the current shell) so the group change takes effect.

Install via pipx

You can install "stt2desktop" with pipx:

sudo apt install pipx
pipx install stt2desktop

Then run:

stt2desktop listen

The default global hotkey is Scroll Lock (In german: "rollen"). You can change it via the --hotkey option (see below). Proposal for alternative key: ctrl_r, alt_r, cmd_r, shift_r ;)

CLI listen

usage: stt2desktop listen [-h] [LISTEN OPTIONS]

Start the STT listener. Hold the hotkey to record, release to transcribe and insert.

╭─ options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ -h, --help                show this help message and exit                                                            │
│ -v, --verbosity           Verbosity level; e.g.: -v, -vv, -vvv, etc. (repeatable)                                    │
│ --model {tiny_en,tiny,base_en,base,small_en,small,medium_en,medium,large_v1,large_v2,large_v3,large,distil_large_v2, │
│ distil_medium_en,distil_small_en,distil_large_v3,distil_large_v3_5,large_v3_turbo,turbo}                             │
│                           Whisper model to use for transcription. (default: small)                                   │
│ --hotkey STR              evdev key name to hold for recording. Release to transcribe and insert text. Examples:     │
│                           KEY_SCROLLLOCK, KEY_RIGHTCTRL, KEY_RIGHTALT. (default: KEY_SCROLLLOCK)                     │
│ --sample-rate INT         Audio sample rate in Hz. Whisper expects 16000. (default: 16000)                           │
│ --device STR              Device to run inference on, e.g. cpu or cuda. (default: auto)                              │
│ --compute-type STR        Quantization type, e.g. int8, float16, float32. (default: int8)                            │
│ --num-workers {None}|INT  Number of parallel transcription workers. Defaults to CPU count. (default: None)           │
│ --sounds, --no-sounds     Play notification sounds via chime. (default: True)                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Whisper models

Just a selection and approximate values:

Model Size Speed Accuracy
tiny ~75 MB fastest lowest
base ~145 MB fast good
small ~460 MB slower better (default)
medium ~1.5 GB slow high

Larger models produce more accurate transcriptions but take longer to process ;)

Troubleshooting

Use pavucontrol to check your audio setup and make sure the correct microphone is selected and working.

Test audio recording:

./cli.py test-recording

Some terminal commands to check your audio setup:

# List capture devices in PulseAudio sound server:
pactl list sources short

# Check current volume:
pactl list sources | grep -A1 "Name: .*input\|Volume:"

# Displays the current state in PipeWire:
wpctl status

Setup loopback mode to hear youself:

# Start:
pactl load-module module-loopback
# Undo:
pactl unload-module module-loopback

start development

At least uv is needed. Install e.g.: via pipx:

apt-get install pipx
pipx install uv

Clone the project and just start the CLI help commands. A virtual environment will be created/updated automatically.

~$ git clone https://github.com/jedie/stt2desktop.git
~$ cd stt2desktop
~/stt2desktop$ ./cli.py --help
~/stt2desktop$ ./dev-cli.py --help
usage: ./dev-cli.py [-h] {coverage,install,lint,mypy,nox,pip-audit,publish,test,update,update-readme-history,update-test-snapshot-files,version}



╭─ options ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ -h, --help    show this help message and exit                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ subcommands ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ (required)                                                                                                           │
│   • coverage  Run tests and show coverage report.                                                                    │
│   • install   Install requirements and 'stt2desktop' via pip as editable.                                            │
│   • lint      Check/fix code style by run: "ruff check --fix"                                                        │
│   • mypy      Run Mypy (configured in pyproject.toml)                                                                │
│   • nox       Run nox                                                                                                │
│   • pip-audit                                                                                                        │
│               Run pip-audit check against current requirements files                                                 │
│   • publish   Build and upload this project to PyPi                                                                  │
│   • test      Run unittests                                                                                          │
│   • update    Update dependencies (uv.lock) and git pre-commit hooks                                                 │
│   • update-readme-history                                                                                            │
│               Update project history base on git commits/tags in README.md Will be exited with 1 if the README.md    │
│               was updated otherwise with 0.                                                                          │
│                                                                                                                      │
│               Also, callable via e.g.:                                                                               │
│                   python -m cli_base update-readme-history -v                                                        │
│   • update-test-snapshot-files                                                                                       │
│               Update all test snapshot files (by remove and recreate all snapshot files)                             │
│   • version   Print version and exit                                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

History

  • v0.3.0
    • 2026-04-23 - avoid double hotkey processing
    • 2026-04-23 - nicer exit
    • 2026-04-23 - fix code style
    • 2026-04-23 - Use a lock file to ensure that only one instance is running
    • 2026-04-23 - restore old clipboard after pasting the STT text
  • v0.2.0
    • 2026-04-22 - paste text via clipboard to avoid keyboard layout issues
    • 2026-04-16 - Add test commands and migrate to ydotool
  • v0.1.2
    • 2026-03-30 - print warning when not running on Linux
    • 2026-03-30 - Update requirements
    • 2026-03-27 - Update README
  • v0.1.1
    • 2026-03-27 - +Proposal for alternative hotkey
    • 2026-03-27 - fix color outputs
    • 2026-03-27 - Update requirements
    • 2026-03-27 - add missing license file.
Expand older history entries ...
  • v0.1.0
    • 2026-03-27 - Use chime to play notification sounds
    • 2026-03-27 - Try to fix github CI run
    • 2026-03-27 - Cleanup README
    • 2026-03-27 - pipx usage
  • v0.0.1
    • 2026-03-26 - Add POC
    • 2026-03-26 - init

About

Local speech-to-text for desktop using faster-whisper

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages