Skip to content

MarcoPalomo/stt-vosk-whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 STT (Speech-To-Text) - Speech Recognition Projects

Comparison and exploration of different approaches for automatic speech recognition with smart punctuation.

Overview

This repository contains two implementations of speech-to-text systems:

πŸ—οΈ Approach πŸ”§ Technology 🎯 Advantage ⚑ Performance
C# WPF Desktop Vosk + Microsoft.ML Native Windows app ⭐⭐⭐
Python Web Whisper + Flask Automatic punctuation ⭐⭐⭐⭐⭐

Project Structure

STT/
β”œβ”€β”€ README.md                    # πŸ‘ˆ This file
β”œβ”€β”€ VoiceNoteAppByLud/          # πŸ–₯️ C# WPF Application
β”‚   β”œβ”€β”€ MainWindow.xaml
β”‚   β”œβ”€β”€ MainWindow.xaml.cs
β”‚   β”œβ”€β”€ App.xaml
β”‚   β”œβ”€β”€ VoiceNotesApp.csproj
β”‚   └── Models/
β”‚       β”œβ”€β”€ vosk-model-fr/      # French Vosk model
β”‚       └── encoder_model.onnx   # T5 for summarization
└── stt-whisper-py/             # 🌐 Python Web App
    β”œβ”€β”€ app.py                  # Flask + SocketIO
    β”œβ”€β”€ requirements.txt
    β”œβ”€β”€ static/                 # Web interface
    β”œβ”€β”€ templates/
    └── utils/                  # AI for summarization & classification

Approach Comparison

C# WPF Version (VoiceNoteAppByLud/)

Advantages:

  • High-performance native Windows application
  • Modern WPF interface with XAML
  • No network dependency
  • Complete system integration

Disadvantages:

  • No automatic punctuation (Vosk limitation)
  • .NET dependency complexity
  • Windows-only
  • Complex T5 tokenizer configuration

Recommended for: Enterprise Windows applications, offline environments


Python Web Version (stt-whisper-py/)

Advantages:

  • Perfect automatic punctuation (Whisper)
  • Responsive web interface (mobile/desktop)
  • Real-time via WebSocket
  • Automatic summarization + classification
  • Cross-platform (Windows/Mac/Linux)

Disadvantages:

  • Requires Python server
  • More resource-intensive
  • Local network dependency

Recommended for: Modern applications, distributed teams, rapid prototyping

Quick Start

Option 1: Python Web (Recommended)

cd stt-whisper-py/
pip install -r requirements.txt
python app.py
# β†’ http://localhost:5000

Option 2: C# Desktop

cd VoiceNoteAppByLud/
dotnet build
dotnet run

Use Cases

Meeting Notes

  • βœ… Python: Transcription + automatic summarization + classification
  • ⚠️ C#: Raw transcription, manual summarization needed

Interviews

  • βœ… Python: Perfect punctuation, JSON export
  • ⚠️ C#: Correct transcription but without punctuation

Courses & Training

  • βœ… Python: Automatic "course" classification, smart summarization
  • βœ… C#: Stable local application, no network latency

Enterprise Environment

  • βœ… C#: Local security, no data on server
  • βœ… Python: Modern interface, collaboration possible

Technologies Used

Component C# WPF Python Web
STT Engine Vosk OpenAI Whisper
Interface WPF/XAML HTML5/CSS3/JS
Backend .NET 8 Flask + SocketIO
Summarization T5 ONNX Transformers (mT5)
Classification ❌ LightGBM
Audio NAudio Web Audio API
Real-time ❌ WebSocket

Performance Metrics

Criteria C# WPF Python Web Winner
Punctuation ❌ βœ… Perfect Python
Speed βœ… Fast ⚑ Real-time Tie
Accuracy ⭐⭐⭐ ⭐⭐⭐⭐⭐ 🐍 Python
Setup Ease ⚠️ Complex βœ… Simple 🐍 Python
Stability βœ… Robust βœ… Robust 🀝 Tie
Security βœ… Local ⚠️ Server πŸ–₯️ C#

Recommendations

Choose Python Web if:

  • You want automatic punctuation (key feature)
  • Need modern and responsive interface
  • Require automatic summarization and classification
  • Distributed team or collaboration needed
  • Rapid prototyping

Choose C# WPF if:

  • Maximum security required (local data only)
  • 100% Windows environment
  • Traditional enterprise application
  • Critical native performance
  • No server allowed

Future Development

Upcoming Python Features:

  • Extended multilingual support
  • REST API for integration
  • Offline mode with local Whisper
  • Advanced analytics dashboard

Possible C# Improvements:

  • Azure Speech Services integration (punctuation)
  • Whisper.NET plugin
  • More modern interface (WinUI 3)

🀝 Contributing

Pull Requests are welcome for both projects!

  • Issues: Report bugs and suggestions
  • Features: Propose new features
  • Docs: Improve documentation

πŸ“„ License

MIT License - Feel free to use for personal and commercial projects.


🎀 Happy transcribing! - May the AI force be with your words.

About

STT audio processing with AI/ML models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors