Skip to content

fredopoku/GestureKey

Repository files navigation

GestureKey

Status Python Platform License

🚧 Actively in development — new features, gestures, and platform support are being added continuously. Star or watch to follow progress.

Real-time hand gesture communication — type into any app on any device using only your hands.

GestureKey turns your laptop camera into a full gesture keyboard. It recognises ASL-inspired hand shapes, predicts words as you spell, and injects text directly into WhatsApp, Gmail, Google Docs, or any application — no controller, no wearable, no plug-in required.

GestureKey Interface


What makes GestureKey different

Feature GestureKey Others
Personal ML model trained to your hands ❌ Generic
Full A–Z alphabet + word prediction ❌ 5–10 gestures
Types into every app on your OS ❌ Locked in-app
Train your own model — no code needed
Hybrid ML + rule fallback, always works
Open, extensible gesture vocabulary

Features

  • Live hand tracking — MediaPipe Hands, 21-landmark skeleton overlay
  • Hybrid classifier — MLP neural network (your personal model) with rule-based fallback; always responsive even before training
  • In-app training — record gesture samples, train a model, and activate it without leaving the app
  • 96%+ accuracy on trained gestures
  • Word prediction — Trie + bigram engine surfaces 5 candidates as you spell; selection by gesture or click
  • Gesture trail — last 6 gestures shown live in the UI
  • Text injection — pyautogui types into any active OS window
  • Animated hold ring — circular arc shows hold progress before a gesture fires
  • Settings — cooldown, hold threshold, and confidence are all tunable at runtime
  • Dark UI — designed to be used alongside any app without distraction

Installation

Requirements: Python 3.10 or 3.11

git clone https://github.com/fredopoku/gesturekey
cd gesturekey

pip install mediapipe opencv-python pyautogui numpy scikit-learn PyQt6

On macOS you may also need:

pip install pyobjc-framework-Quartz

Running

python3 gesturekey_ui.py

The camera opens immediately. If you already have a trained model it activates automatically — the header badge shows ML XX%. Without a trained model it falls back to the built-in rule-based classifier instantly.


Training your personal model

The ML model is trained on your hands, not a generic dataset. This is why accuracy is high.

In the app:

  1. Click the Train tab in the right panel
  2. Click Rec next to any gesture — a 3-second countdown starts
  3. Hold the gesture steady for ~10 seconds (60 samples captured automatically)
  4. Repeat for as many gestures as you want
  5. Click ▶ Train Model — training runs in ~15 seconds
  6. The model activates immediately. The header badge switches to ML XX%

Each time you retrain, all recorded samples are used — old data is never lost. The more gestures and samples you add, the better the model gets.

From the command line (original trainer):

python3 gesture_trainer.py

Gesture reference

Letters (ASL-inspired)

Gesture Hand shape Types
A Fist, thumb beside index a
B Four fingers up, thumb folded b
C Curved hand, O-like opening c
D Index pointing up, others form circle d
E All fingers bent, thumb tucked e
F Index + thumb touch, others up f
G Index pointing sideways g
H Index + middle pointing flat h
I Pinky up, others closed i
L L-shape: thumb + index out l
O All fingers curve to thumb o
V Peace sign: index + middle spread v
W Three fingers up (index, middle, ring) w
Y Thumb + pinky out y

Selection gestures (word prediction)

Gesture Selects
👍 Thumbs up Word slot 1 (top prediction)
✌️ Peace sign Word slot 2
3 fingers Word slot 3
4 fingers Word slot 4
🤙 Shaka Word slot 5

Word gestures

Gesture Types
Open palm (all 5 spread) Hello
Fist nod Yes
Flat hand from chin Thank you

Architecture

Camera (OpenCV)
    │
    ▼
MediaPipe Hands  →  21 landmarks (x, y, z)
    │
    ▼
Normalise  →  wrist-centred, scale-invariant 63-dim vector
    │
    ├── GestureMLModel (sklearn MLP 256→128→64)   ← your personal model
    │       confidence ≥ 72% → use result
    │
    └── Rule-based classifier                      ← always available fallback
    │
    ▼
GestureMapper  →  gesture name → character / command
    │
    ├── WordPredictor (Trie + bigram scoring)  →  5 candidates
    │
    └── TextOutput (pyautogui)  →  types into active OS window

Model files:

data/
  vocabulary.json          word frequency table
  bigrams.json             context pairs for prediction
  trie.pkl                 cached prefix tree
  gesture_mlp.pkl          your trained MLP model
gesture_training_data.json raw keypoint samples per gesture

Roadmap

  • Rule-based gesture classifier
  • MediaPipe hand tracking
  • PyAutoGUI text injection (all OS apps)
  • Word prediction — Trie + bigram engine
  • PyQt6 desktop UI — camera, guide, prediction bar, composer
  • In-app MLP training pipeline
  • Hybrid ML + rule classifier with hot-reload
  • Full A–Z trained model (recording in progress)
  • TFLite model export — cross-platform inference
  • Web version — MediaPipe.js, runs in any browser on any device
  • Packaged installers — .app (macOS), .exe (Windows)
  • Android app — MediaPipe SDK + Accessibility Service
  • iOS app — MediaPipe iOS + system keyboard extension
  • Cloud model sync — train once, use everywhere
  • Custom gesture vocabulary — define your own mappings
  • LLM-powered word prediction

Project structure

gesturekey/
├── gesturekey_ui.py          main application (PyQt6)
├── gesture_mapper.py         hybrid ML + rule classifier
├── gesture_model.py          MLP training and inference engine
├── gesture_trainer.py        CLI gesture recording tool
├── word_predictor.py         Trie + bigram word prediction
├── text_output.py            pyautogui text injection layer
├── gesture_training_data.json captured keypoint samples
└── data/
    ├── vocabulary.json
    ├── bigrams.json
    ├── trie.pkl
    └── gesture_mlp.pkl       trained model (created after first training)

Author

Frederick Opoku-Afriyie
MSc Computer Science, 2025

GestureKey concept, system design, and implementation.

Built with:


GestureKey — a universal gesture input paradigm for accessible digital communication.

About

Type with your hands — real-time gesture keyboard for any app. Personal ML model, full A–Z alphabet, word prediction. Built for accessibility and speed. Active development.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages