🚧 Actively in development — new features, gestures, and platform support are being added continuously. Star or watch to follow progress.
Real-time hand gesture communication — type into any app on any device using only your hands.
GestureKey turns your laptop camera into a full gesture keyboard. It recognises ASL-inspired hand shapes, predicts words as you spell, and injects text directly into WhatsApp, Gmail, Google Docs, or any application — no controller, no wearable, no plug-in required.
| Feature | GestureKey | Others |
|---|---|---|
| Personal ML model trained to your hands | ✅ | ❌ Generic |
| Full A–Z alphabet + word prediction | ✅ | ❌ 5–10 gestures |
| Types into every app on your OS | ✅ | ❌ Locked in-app |
| Train your own model — no code needed | ✅ | ❌ |
| Hybrid ML + rule fallback, always works | ✅ | ❌ |
| Open, extensible gesture vocabulary | ✅ | ❌ |
- Live hand tracking — MediaPipe Hands, 21-landmark skeleton overlay
- Hybrid classifier — MLP neural network (your personal model) with rule-based fallback; always responsive even before training
- In-app training — record gesture samples, train a model, and activate it without leaving the app
- 96%+ accuracy on trained gestures
- Word prediction — Trie + bigram engine surfaces 5 candidates as you spell; selection by gesture or click
- Gesture trail — last 6 gestures shown live in the UI
- Text injection — pyautogui types into any active OS window
- Animated hold ring — circular arc shows hold progress before a gesture fires
- Settings — cooldown, hold threshold, and confidence are all tunable at runtime
- Dark UI — designed to be used alongside any app without distraction
Requirements: Python 3.10 or 3.11
git clone https://github.com/fredopoku/gesturekey
cd gesturekey
pip install mediapipe opencv-python pyautogui numpy scikit-learn PyQt6On macOS you may also need:
pip install pyobjc-framework-Quartzpython3 gesturekey_ui.pyThe camera opens immediately. If you already have a trained model it activates automatically — the header badge shows ML XX%. Without a trained model it falls back to the built-in rule-based classifier instantly.
The ML model is trained on your hands, not a generic dataset. This is why accuracy is high.
In the app:
- Click the Train tab in the right panel
- Click Rec next to any gesture — a 3-second countdown starts
- Hold the gesture steady for ~10 seconds (60 samples captured automatically)
- Repeat for as many gestures as you want
- Click ▶ Train Model — training runs in ~15 seconds
- The model activates immediately. The header badge switches to ML XX%
Each time you retrain, all recorded samples are used — old data is never lost. The more gestures and samples you add, the better the model gets.
From the command line (original trainer):
python3 gesture_trainer.py| Gesture | Hand shape | Types |
|---|---|---|
| A | Fist, thumb beside index | a |
| B | Four fingers up, thumb folded | b |
| C | Curved hand, O-like opening | c |
| D | Index pointing up, others form circle | d |
| E | All fingers bent, thumb tucked | e |
| F | Index + thumb touch, others up | f |
| G | Index pointing sideways | g |
| H | Index + middle pointing flat | h |
| I | Pinky up, others closed | i |
| L | L-shape: thumb + index out | l |
| O | All fingers curve to thumb | o |
| V | Peace sign: index + middle spread | v |
| W | Three fingers up (index, middle, ring) | w |
| Y | Thumb + pinky out | y |
| Gesture | Selects |
|---|---|
| 👍 Thumbs up | Word slot 1 (top prediction) |
| ✌️ Peace sign | Word slot 2 |
| 3 fingers | Word slot 3 |
| 4 fingers | Word slot 4 |
| 🤙 Shaka | Word slot 5 |
| Gesture | Types |
|---|---|
| Open palm (all 5 spread) | Hello |
| Fist nod | Yes |
| Flat hand from chin | Thank you |
Camera (OpenCV)
│
▼
MediaPipe Hands → 21 landmarks (x, y, z)
│
▼
Normalise → wrist-centred, scale-invariant 63-dim vector
│
├── GestureMLModel (sklearn MLP 256→128→64) ← your personal model
│ confidence ≥ 72% → use result
│
└── Rule-based classifier ← always available fallback
│
▼
GestureMapper → gesture name → character / command
│
├── WordPredictor (Trie + bigram scoring) → 5 candidates
│
└── TextOutput (pyautogui) → types into active OS window
Model files:
data/
vocabulary.json word frequency table
bigrams.json context pairs for prediction
trie.pkl cached prefix tree
gesture_mlp.pkl your trained MLP model
gesture_training_data.json raw keypoint samples per gesture
- Rule-based gesture classifier
- MediaPipe hand tracking
- PyAutoGUI text injection (all OS apps)
- Word prediction — Trie + bigram engine
- PyQt6 desktop UI — camera, guide, prediction bar, composer
- In-app MLP training pipeline
- Hybrid ML + rule classifier with hot-reload
- Full A–Z trained model (recording in progress)
- TFLite model export — cross-platform inference
- Web version — MediaPipe.js, runs in any browser on any device
- Packaged installers —
.app(macOS),.exe(Windows) - Android app — MediaPipe SDK + Accessibility Service
- iOS app — MediaPipe iOS + system keyboard extension
- Cloud model sync — train once, use everywhere
- Custom gesture vocabulary — define your own mappings
- LLM-powered word prediction
gesturekey/
├── gesturekey_ui.py main application (PyQt6)
├── gesture_mapper.py hybrid ML + rule classifier
├── gesture_model.py MLP training and inference engine
├── gesture_trainer.py CLI gesture recording tool
├── word_predictor.py Trie + bigram word prediction
├── text_output.py pyautogui text injection layer
├── gesture_training_data.json captured keypoint samples
└── data/
├── vocabulary.json
├── bigrams.json
├── trie.pkl
└── gesture_mlp.pkl trained model (created after first training)
Frederick Opoku-Afriyie
MSc Computer Science, 2025
GestureKey concept, system design, and implementation.
Built with:
- MediaPipe Hands — Google Research (Zhang et al., 2020)
- OpenCV — Intel / OpenCV Foundation
- PyQt6 — Riverbank Computing
- scikit-learn — Pedregosa et al., 2011
- PyAutoGUI — Al Sweigart
GestureKey — a universal gesture input paradigm for accessible digital communication.
