Comparison and exploration of different approaches for automatic speech recognition with smart punctuation.
This repository contains two implementations of speech-to-text systems:
| ποΈ Approach | π§ Technology | π― Advantage | β‘ Performance |
|---|---|---|---|
| C# WPF Desktop | Vosk + Microsoft.ML | Native Windows app | βββ |
| Python Web | Whisper + Flask | Automatic punctuation | βββββ |
STT/
βββ README.md # π This file
βββ VoiceNoteAppByLud/ # π₯οΈ C# WPF Application
β βββ MainWindow.xaml
β βββ MainWindow.xaml.cs
β βββ App.xaml
β βββ VoiceNotesApp.csproj
β βββ Models/
β βββ vosk-model-fr/ # French Vosk model
β βββ encoder_model.onnx # T5 for summarization
βββ stt-whisper-py/ # π Python Web App
βββ app.py # Flask + SocketIO
βββ requirements.txt
βββ static/ # Web interface
βββ templates/
βββ utils/ # AI for summarization & classification
Advantages:
- High-performance native Windows application
- Modern WPF interface with XAML
- No network dependency
- Complete system integration
Disadvantages:
- No automatic punctuation (Vosk limitation)
- .NET dependency complexity
- Windows-only
- Complex T5 tokenizer configuration
Recommended for: Enterprise Windows applications, offline environments
Advantages:
- Perfect automatic punctuation (Whisper)
- Responsive web interface (mobile/desktop)
- Real-time via WebSocket
- Automatic summarization + classification
- Cross-platform (Windows/Mac/Linux)
Disadvantages:
- Requires Python server
- More resource-intensive
- Local network dependency
Recommended for: Modern applications, distributed teams, rapid prototyping
cd stt-whisper-py/
pip install -r requirements.txt
python app.py
# β http://localhost:5000cd VoiceNoteAppByLud/
dotnet build
dotnet run- β Python: Transcription + automatic summarization + classification
β οΈ C#: Raw transcription, manual summarization needed
- β Python: Perfect punctuation, JSON export
β οΈ C#: Correct transcription but without punctuation
- β Python: Automatic "course" classification, smart summarization
- β C#: Stable local application, no network latency
- β C#: Local security, no data on server
- β Python: Modern interface, collaboration possible
| Component | C# WPF | Python Web |
|---|---|---|
| STT Engine | Vosk | OpenAI Whisper |
| Interface | WPF/XAML | HTML5/CSS3/JS |
| Backend | .NET 8 | Flask + SocketIO |
| Summarization | T5 ONNX | Transformers (mT5) |
| Classification | β | LightGBM |
| Audio | NAudio | Web Audio API |
| Real-time | β | WebSocket |
| Criteria | C# WPF | Python Web | Winner |
|---|---|---|---|
| Punctuation | β | β Perfect | Python |
| Speed | β Fast | β‘ Real-time | Tie |
| Accuracy | βββ | βββββ | π Python |
| Setup Ease | β Simple | π Python | |
| Stability | β Robust | β Robust | π€ Tie |
| Security | β Local | π₯οΈ C# |
- You want automatic punctuation (key feature)
- Need modern and responsive interface
- Require automatic summarization and classification
- Distributed team or collaboration needed
- Rapid prototyping
- Maximum security required (local data only)
- 100% Windows environment
- Traditional enterprise application
- Critical native performance
- No server allowed
- Extended multilingual support
- REST API for integration
- Offline mode with local Whisper
- Advanced analytics dashboard
- Azure Speech Services integration (punctuation)
- Whisper.NET plugin
- More modern interface (WinUI 3)
Pull Requests are welcome for both projects!
- Issues: Report bugs and suggestions
- Features: Propose new features
- Docs: Improve documentation
MIT License - Feel free to use for personal and commercial projects.
π€ Happy transcribing! - May the AI force be with your words.