VoiceBridge Desktop replaces the Chrome Extension with an Electron + Preact desktop application that installs an OS-level virtual microphone driver. Instead of injecting into browser tabs via content scripts and WebRTC track replacement, the desktop app creates a system-wide virtual audio device ("VoiceBridge Mic") that any application can select as its microphone input. The translation pipeline (STT → LLM → TTS) is reused wholesale from the existing codebase.
Chrome Extension (current):
Content Script → getUserMedia intercept → WebRTC replaceTrack → Meeting App
Desktop App (new):
Native Addon → Real Mic Capture → Pipeline → Virtual Mic Driver → ANY App
The desktop app eliminates all meeting-platform-specific adapters, content script injection, offscreen documents, and the fragile audio bridge between extension contexts. The virtual mic driver operates at the OS audio layer, making it universally compatible.
| Module | Status | Changes Required |
|---|---|---|
PipelineOrchestrator |
Reuse | Replace chrome.* calls with Electron IPC; remove sendMessage chrome dependency |
STTClient |
Reuse | No changes — pure WebSocket client |
TranslationEngine |
Reuse | No changes — pure HTTP/fetch streaming |
TTSClient |
Reuse | No changes — pure WebSocket client |
EchoCancellationModule |
Reuse | No changes — pure state machine |
LatencyMonitor |
Reuse | No changes — pure timing logic |
DegradationManager |
Reuse | No changes — pure state computation |
CleanupSequencer |
Reuse | No changes — pure cleanup orchestration |
AudioCaptureModule |
Replace | New N-API-based capture instead of getUserMedia |
AudioOutputModule |
Replace | New N-API-based virtual mic writer instead of WebRTC destination |
VoiceProfileManager |
Reuse | Replace MediaRecorder with N-API recording; REST API calls unchanged |
SettingsStore |
Replace | Filesystem JSON + Node.js crypto instead of chrome.storage + Web Crypto |
MessageBus |
Replace | Electron IPC (ipcMain/ipcRenderer) instead of chrome.runtime.sendMessage |
DebugLog |
Reuse | Increase buffer to 500 entries; no structural changes |
AudioBridge |
Remove | No longer needed — direct N-API calls replace MessageChannel bridge |
MeetingDetector |
Remove | No longer needed — virtual mic works with any app |
PlatformAdapters |
Remove | No longer needed — no per-meeting-app injection |
graph TB
subgraph "Electron Main Process"
M[Main Process<br/>Node.js + N-API]
IPC[IPC Router]
NA[Native Addon<br/>napi-rs]
SS[SettingsStore<br/>Filesystem + AES-GCM]
DI[DriverInstaller]
AT[AutoStart Manager]
end
subgraph "Electron Renderer Process"
R[Preact UI<br/>Main Window]
T[System Tray]
end
subgraph "OS Audio Layer"
RM[Real Microphone]
VM[Virtual Mic Driver<br/>"VoiceBridge Mic"]
MA[Meeting App<br/>Zoom / Teams / etc.]
end
subgraph "Pipeline (Main Process)"
PO[PipelineOrchestrator]
STT[STTClient<br/>WebSocket]
TE[TranslationEngine<br/>HTTP Stream]
TTS[TTSClient<br/>WebSocket]
EC[EchoCancellation]
LM[LatencyMonitor]
DM[DegradationManager]
end
R <-->|contextBridge IPC| IPC
T <-->|Electron Tray API| M
IPC <--> M
M --> NA
NA -->|Capture PCM 16kHz| RM
NA -->|Write PCM 48kHz| VM
VM -->|OS Audio Route| MA
M --> SS
M --> DI
M --> AT
PO --> STT
PO --> TE
PO --> TTS
PO --> EC
PO --> LM
PO --> DM
M --> PO
NA -->|Audio chunks| PO
PO -->|TTS audio| NA
The Electron renderer process runs with contextIsolation: true and nodeIntegration: false. All sensitive operations (API calls, encryption, native addon access, file I/O) happen exclusively in the main process. The renderer communicates via a typed contextBridge API that exposes only the methods needed for the UI.
// SECURITY: Renderer has NO access to:
// - Node.js APIs (fs, crypto, child_process)
// - Native addon
// - API keys (never leave main process)
// - Raw IPC (only contextBridge-exposed methods)sequenceDiagram
participant R as Renderer (Preact)
participant CB as contextBridge
participant M as Main Process
participant NA as Native Addon
participant PO as Pipeline
R->>CB: voicebridge.startSession({src, tgt})
CB->>M: ipcRenderer.invoke('session:start', {src, tgt})
M->>NA: nativeAddon.startCapture(deviceId)
M->>PO: pipeline.startSession({src, tgt})
NA-->>M: onAudioChunk(pcm)
M->>PO: handleAudioChunk(pcm)
PO->>PO: STT → Translate → TTS
PO-->>M: onTTSAudio(pcm)
M->>NA: nativeAddon.writeVirtualMic(pcm)
M-->>R: event: 'pipeline:latency-update'
All IPC messages are validated against a typed schema. The main process rejects any message that doesn't match the expected shape.
The N-API addon (built with napi-rs for Rust safety and cross-compilation) provides the bridge between Node.js and OS audio APIs.
/** Native addon interface exposed to the main process */
interface NativeAudioAddon {
// ── Device Enumeration ──
enumerateInputDevices(): AudioDeviceInfo[];
getDefaultInputDevice(): AudioDeviceInfo | null;
// ── Real Mic Capture ──
startCapture(deviceId: string, config: CaptureConfig): void;
stopCapture(): void;
setCaptureGain(gainDb: number): void;
onAudioChunk: (callback: (pcm: Buffer, sequenceId: number) => void) => void;
// ── Virtual Mic Driver ──
isDriverInstalled(): boolean;
getDriverVersion(): string | null;
installDriver(): DriverInstallResult;
uninstallDriver(): boolean;
writeVirtualMic(pcm: Buffer): void;
getDriverStatus(): DriverStatus;
// ── Resampling ──
resample(pcm: Buffer, fromRate: number, toRate: number): Buffer;
}
interface AudioDeviceInfo {
id: string;
name: string;
sampleRate: number;
channels: number;
isDefault: boolean;
}
interface CaptureConfig {
sampleRate: 16000;
channels: 1;
bufferSizeMs: 250;
format: 'pcm_s16le';
}
interface DriverInstallResult {
success: boolean;
error?: string;
osErrorCode?: number;
requiresReboot?: boolean;
}
type DriverStatus =
| { state: 'installed'; version: string; active: boolean; sampleRate: number }
| { state: 'not-installed' }
| { state: 'error'; error: string };graph LR
subgraph "napi-rs Native Addon"
API[Unified N-API Interface]
end
subgraph "macOS"
HAL[CoreAudio HAL Plugin<br/>AudioServerPlugin]
CA[CoreAudio<br/>AudioHardware API]
end
subgraph "Windows"
WAS[WASAPI Virtual Endpoint<br/>IAudioClient]
APO[Audio Processing Object]
end
subgraph "Linux"
PA[PulseAudio<br/>module-null-sink]
LB[module-loopback]
end
API --> HAL
API --> WAS
API --> PA
HAL --> CA
WAS --> APO
PA --> LB
macOS (CoreAudio HAL Plugin):
- Registers as an
AudioServerPluginbundle in/Library/Audio/Plug-Ins/HAL/ - Creates a virtual audio device with 1 channel, 48kHz, Float32
- The plugin reads from a shared memory ring buffer that the N-API addon writes to
- Requires
sudofor installation (copies plugin bundle to system directory)
Windows (WASAPI Virtual Audio Endpoint):
- Registers a virtual audio endpoint driver via Windows Audio Device Graph
- Creates a device named "VoiceBridge Mic" with 1 channel, 48kHz, PCM Int16
- Uses a named shared memory section for audio data transfer
- Requires administrator elevation for driver installation
Linux (PulseAudio):
- Creates a null sink via
pactl load-module module-null-sink sink_name=voicebridge_mic sink_properties=device.description="VoiceBridge Mic" - Configures
module-loopbackto route the sink's monitor to a virtual source - No elevation required — PulseAudio modules are user-space
- Configuration persists via
~/.config/pulse/default.pa
Replaces the Chrome extension's AudioCaptureModule + AudioOutputModule + AudioBridge with a single unified module that coordinates real mic capture, pipeline routing, and virtual mic output.
interface AudioRouterConfig {
captureDeviceId: string | null; // null = OS default
captureSampleRate: 16000;
outputSampleRate: 48000;
noiseGateThresholdDb: number;
vadSpeechOnsetMs: number;
vadSpeechOffsetMs: number;
ghostModeEnabled: boolean;
}
interface AudioRouter {
start(config: AudioRouterConfig): void;
stop(): void;
setPassthrough(enabled: boolean): void;
setCaptureDevice(deviceId: string): void;
setGhostMode(enabled: boolean): void;
setNoiseGateThreshold(db: number): void;
getInputLevel(): number;
getAverageInputLevel(): number;
// Callbacks wired to PipelineOrchestrator
onAudioChunk: ((chunk: Buffer, sequenceId: number) => void) | null;
onVADStateChange: ((state: VADState) => void) | null;
onSpeechEnd: (() => void) | null;
// Called by PipelineOrchestrator when TTS audio is ready
writeTTSAudio(pcm: Buffer): void;
writeSilence(): void;
writePassthrough(pcm: Buffer): void;
fadeOutTTS(durationMs: number): void;
}The AudioRouter reuses the existing pure functions computeRmsDb, transitionVADState, transitionRoutingState, and getAudioSource from the current codebase. Only the I/O layer changes (N-API instead of Web Audio API).
Wraps the existing PipelineOrchestrator with Electron-specific wiring. The core orchestration logic (sequence tracking, backpressure, degradation cascade, stage timeouts) is reused unchanged.
interface DesktopPipelineConfig extends PipelineOrchestratorConfig {
// Desktop-specific additions
virtualMicWriteLatencyMs: number;
volumeNormalizationEnabled: boolean;
referenceVolumeCalibrationMs: number;
}
/** Adapts PipelineOrchestrator for Electron main process */
class DesktopPipeline {
#orchestrator: PipelineOrchestrator;
#audioRouter: AudioRouter;
#nativeAddon: NativeAudioAddon;
#ipcEmitter: ElectronIPCEmitter;
// Replaces chrome.runtime.sendMessage with Electron IPC
// Replaces AudioCaptureModule with AudioRouter + NativeAddon
// Replaces AudioOutputModule with NativeAddon.writeVirtualMic
}Replaces chrome.storage with filesystem-based JSON storage using Node.js crypto for AES-GCM-256 encryption.
interface DesktopSettingsStore {
get<K extends keyof SettingsSchema>(key: K): Promise<SettingsSchema[K]>;
set<K extends keyof SettingsSchema>(key: K, value: SettingsSchema[K]): Promise<void>;
getAll(): Promise<Partial<SettingsSchema>>;
exportSettings(): Promise<string>;
importSettings(json: string): Promise<void>;
migrateFromVersion(oldVersion: string): Promise<void>;
// Atomic writes: write to .tmp, then rename
flush(): Promise<void>;
}Storage paths:
- macOS:
~/Library/Application Support/VoiceBridge/settings.json - Windows:
%APPDATA%/VoiceBridge/settings.json - Linux:
~/.config/voicebridge/settings.json
Encryption: API keys are encrypted with AES-GCM-256. The encryption key is derived via PBKDF2 (100,000 iterations, SHA-256) from a machine-specific identifier (os.hostname() + os.userInfo().username) combined with a per-install random salt stored alongside the settings file.
Replaces chrome.runtime.sendMessage with Electron's typed IPC.
/** Main process → Renderer events (one-way) */
interface MainToRendererEvents {
'pipeline:latency-update': LatencyMeasurement;
'pipeline:stage-update': { sequenceId: number; stage: PipelineStage };
'pipeline:degradation-changed': { level: DegradationLevel; previous: DegradationLevel };
'session:state-changed': SessionState;
'connection:state-changed': { service: 'stt' | 'tts' | 'llm'; state: ServiceConnectionState };
'audio:level': { rmsDb: number; vadState: VADState['status'] };
'audio:driver-status': DriverStatus;
'demo:time-update': { voiceTimeRemainingMs: number };
'error': { code: string; message: string; userMessage: string };
}
/** Renderer → Main process invocations (request/response) */
interface RendererToMainInvocations {
'session:start': [{ sourceLanguage: string; targetLanguage: string }, void];
'session:stop': [{ reason: string }, void];
'settings:get': [{ key: string }, unknown];
'settings:set': [{ key: string; value: unknown }, void];
'settings:export': [void, string];
'settings:import': [string, void];
'devices:list': [void, AudioDeviceInfo[]];
'devices:select': [{ deviceId: string }, void];
'driver:status': [void, DriverStatus];
'driver:install': [void, DriverInstallResult];
'driver:uninstall': [void, boolean];
'voice:start-recording': [void, void];
'voice:stop-recording': [void, Blob];
'voice:upload': [void, string];
'voice:delete': [{ voiceId: string }, void];
'voice:preview': [{ voiceId: string; text: string; language: string }, ArrayBuffer];
'languages:list': [void, Language[]];
'debug:get-log': [void, DebugLogEntry[]];
}/** Exposed to renderer via contextBridge.exposeInMainWorld('voicebridge', api) */
interface VoiceBridgeAPI {
// Session
startSession(params: { sourceLanguage: string; targetLanguage: string }): Promise<void>;
stopSession(reason: string): Promise<void>;
// Settings
getSetting(key: string): Promise<unknown>;
setSetting(key: string, value: unknown): Promise<void>;
exportSettings(): Promise<string>;
importSettings(json: string): Promise<void>;
// Audio devices
listDevices(): Promise<AudioDeviceInfo[]>;
selectDevice(deviceId: string): Promise<void>;
// Driver
getDriverStatus(): Promise<DriverStatus>;
installDriver(): Promise<DriverInstallResult>;
// Voice profile
startRecording(): Promise<void>;
stopRecording(): Promise<Blob>;
uploadVoice(): Promise<string>;
deleteVoice(voiceId: string): Promise<void>;
previewVoice(voiceId: string, text: string, language: string): Promise<ArrayBuffer>;
// Languages
listLanguages(): Promise<Language[]>;
// Debug
getDebugLog(): Promise<DebugLogEntry[]>;
// Events (main → renderer)
on(event: string, callback: (...args: unknown[]) => void): () => void;
}graph TB
subgraph "System Tray"
TI[Tray Icon<br/>State-aware]
TM[Tray Context Menu]
end
subgraph "Main Window (360×480)"
MW[MainWindow]
HE[Header<br/>Logo + Status]
TG[SessionToggle<br/>Mechanical pill]
LP[LanguagePair<br/>Source ↔ Target]
LT[LatencyDisplay<br/>Space Mono hero]
CS[ConnectionStatus<br/>STT / TTS / LLM]
DL[DegradationLabel<br/>Inline status]
MD[MicDevice<br/>Dropdown]
DT[DemoTimer<br/>Segmented bar]
end
subgraph "Settings Window"
SW[SettingsWindow]
AK[APIKeyInputs<br/>Underline style]
VP[VoiceProfile<br/>Record / Preview]
AD[AudioDevices<br/>Input selector]
TS[TranslationSettings<br/>Context, formality, glossary]
PS[PerformanceSettings<br/>Latency priority]
AS[AutoStartToggle]
KB[KeyboardShortcuts]
DB[DebugLogView<br/>Ring buffer viewer]
end
TI --> MW
TM --> SW
MW --> HE
MW --> TG
MW --> LP
MW --> LT
MW --> CS
MW --> DL
MW --> MD
MW --> DT
The UI uses Preact (3KB gzipped) with the Nothing design system CSS tokens. No React, no Tailwind, no state management libraries — plain Preact with useReducer for local state and the contextBridge API for main process communication.
interface DriverInstaller {
checkInstalled(): DriverStatus;
install(): Promise<DriverInstallResult>;
uninstall(): Promise<boolean>;
checkVersionCompatibility(bundledVersion: string): boolean;
verifyDevicePresent(): boolean;
}On macOS and Windows, installation requires elevated privileges. The app uses sudo-prompt (macOS) or electron-sudo patterns to request elevation. On Linux, PulseAudio module loading is user-space and requires no elevation.
interface AutoStartManager {
isEnabled(): Promise<boolean>;
enable(): Promise<void>;
disable(): Promise<void>;
}Uses app.setLoginItemSettings() on macOS/Windows. On Linux, creates/removes a .desktop file in ~/.config/autostart/.
The settings schema is largely reused from the Chrome extension, with these changes:
interface DesktopSettingsSchema {
// ── Encrypted (AES-GCM-256) ──
elevenLabsApiKey: string;
llmApiKey: string;
// ── Plaintext JSON ──
// Language
sourceLanguage: string;
targetLanguage: string;
recentLanguages: string[];
// LLM
llmProvider: LLMProvider;
openRouterModel: string;
contextWindowSize: number;
preserveTechnicalTerms: boolean;
customGlossary: GlossaryEntry[];
meetingContext: string;
formalityLevel: 'formal' | 'informal';
// Voice
voiceProfileId: string;
voiceStability: number;
voiceSimilarityBoost: number;
voiceStyle: number;
// Audio
selectedMicDeviceId: string | null;
noiseGateThresholdDb: number;
vadSensitivity: 'low' | 'medium' | 'high';
ghostMode: boolean;
// App
autoStartEnabled: boolean;
theme: 'dark' | 'light' | 'system';
keyboardShortcuts: {
toggleTranslation: string;
ghostMode: string;
panicStop: string;
};
// Demo
demoMode: boolean;
demoUsage: DemoUsageState;
embeddedKeyExhausted: boolean;
embeddedKeyLastChecked: number;
// Internal
installId: string;
onboardingComplete: boolean;
settingsSchemaVersion: number;
driverVersion: string | null;
// Cache
languageCache: { stt: string[]; tts: string[]; cachedAt: number };
}The PipelineUtterance, SessionState, LatencyMeasurement, EchoState, VADState, AudioRoutingState, and DegradationLevel types are reused unchanged from src/lib/types.ts.
/** Validated IPC message envelope */
interface IPCMessage<T extends string = string> {
channel: T;
payload: unknown;
timestamp: number;
nonce: string; // Unique per message, prevents replay
}/** Audio data flowing through the pipeline */
interface AudioChunk {
pcm: Buffer; // Raw PCM data
sampleRate: number; // 16000 (capture) or 48000 (output)
channels: 1;
format: 'pcm_s16le' | 'pcm_f32le';
sequenceId: number;
timestamp: number;
}/** Bundled with each platform build */
interface DriverManifest {
version: string;
platform: 'darwin' | 'win32' | 'linux';
arch: 'x64' | 'arm64';
driverBinaryPath: string;
installScript: string;
uninstallScript: string;
checksum: string;
}A property is a characteristic or behavior that should hold true across all valid executions of a system — essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.
For any sequence of PCM Int16 audio samples of arbitrary length fed into the audio chunker, every emitted chunk SHALL be exactly 4000 samples (250ms at 16kHz), and the total number of samples across all emitted chunks SHALL equal the largest multiple of 4000 that is ≤ the total input sample count.
Validates: Requirements 2.4
For any PCM Int16 buffer at 24kHz, resampling to 48kHz Float32 SHALL produce an output buffer with exactly 2× the input sample count, and every output sample SHALL be in the range [-1.0, 1.0].
Validates: Requirements 2.5, 4.1
For any valid AudioRoutingState and any AudioRoutingEvent, the transitionRoutingState function SHALL return a valid AudioRoutingState, and the getAudioSource function SHALL return the correct audio source: 'mic' for PASSTHROUGH, 'silence' for MUTED, 'tts' for TTS_PLAYING, and 'mic-fade-tts' for BARGE_IN.
Validates: Requirements 2.6, 4.3, 4.4
For any sequence of energy levels (in dB) and timestamps, the transitionVADState function SHALL only transition from silence to speech after the energy has been above the threshold for at least onsetDelayMs consecutive milliseconds, and SHALL only transition from speech to silence after the energy has been below the threshold for at least offsetDelayMs consecutive milliseconds.
Validates: Requirements 2.8
For any PCM Int16 audio frame, if computeRmsDb(frame) returns a value ≤ the configured noise gate threshold, the frame SHALL be rejected (not forwarded to STT). If the value is > the threshold, the frame SHALL pass through.
Validates: Requirements 2.9
For any sequence of EchoEvent values, the transitionEchoState function SHALL enforce: (1) from listening, only tts_start transitions to speaking; (2) from speaking, only tts_end transitions to transitioning and only barge_in transitions to listening; (3) from transitioning, only transition_complete or barge_in transitions to listening. The mic SHALL be muted in speaking and transitioning states only.
Validates: Requirements 3.5
For any ServiceHealth state (with each service being connected, disconnected, connecting, or error), computeDegradationLevel SHALL return: 'full' iff all three services are connected, 'text-only' iff STT and LLM are connected but TTS is not, 'transcription-only' iff only STT is connected, and 'passthrough' iff STT is not connected.
Validates: Requirements 3.7
For any sequence of utterance arrivals and completions, the pipeline SHALL never have more than maxQueueSize (3) utterances in CAPTURED or TRANSCRIBED state simultaneously, and SHALL never have more than maxActiveUtterances (10) utterances in the tracking map. Excess utterances SHALL be dropped with reason 'backpressure'.
Validates: Requirements 3.10
For any reference RMS level measured from the real microphone and any TTS audio buffer, the normalized output RMS SHALL be within ±3dB of the reference level.
Validates: Requirements 4.7
For any latency value in milliseconds, the color mapping SHALL return 'green' for values < 1500, 'yellow' for values in [1500, 2500], and 'red' for values > 2500.
Validates: Requirements 5.6
For any source language code and search query string, the filtered target language list SHALL (1) never contain the selected source language, and (2) contain only languages whose name or code matches the search query (case-insensitive substring match).
Validates: Requirements 5.5, 6.2
For any non-empty string, encrypting with AES-GCM-256 (using PBKDF2-derived key from machine identifier + salt) and then decrypting SHALL produce the original string.
Validates: Requirements 6.8, 10.3, 12.2
For any valid DesktopSettingsSchema object, writing all settings to the store and then reading them back SHALL produce an object equal to the original. Encrypted fields SHALL be stored as ciphertext on disk and decrypted transparently on read.
Validates: Requirements 12.1, 12.3, 6.10
For any settings state (including populated API keys), the exportSettings function SHALL produce a JSON string that does NOT contain the keys elevenLabsApiKey or llmApiKey, and SHALL contain all non-sensitive settings.
Validates: Requirements 12.5
For any JSON string, importSettings SHALL accept it only if it conforms to the DesktopSettingsSchema (correct types for all present fields). Invalid JSON or mistyped fields SHALL cause the import to reject without modifying the current settings.
Validates: Requirements 12.6
For any settings object conforming to an older schema version, the migration function SHALL produce a valid object conforming to the current schema version, preserving all compatible field values and applying defaults for new fields.
Validates: Requirements 12.10
For any IPC message object, the validator SHALL accept it if and only if it has a valid channel string matching a known channel name, a payload matching the expected type for that channel, a numeric timestamp, and a string nonce. All other shapes SHALL be rejected.
Validates: Requirements 10.10
For any sequence of VAD-active speech durations within a 24-hour rolling window, the cumulative voice time SHALL be correctly tracked, and the session SHALL be stopped when the cumulative time reaches 300,000ms (5 minutes). The window SHALL roll forward such that usage older than 24 hours is no longer counted.
Validates: Requirements 11.3
For any number of log entries added (including numbers > 500), the debug log buffer size SHALL never exceed 500 entries. When the buffer is full, the oldest entry SHALL be evicted first (FIFO).
Validates: Requirements 7.7
For any sequence of end-to-end latency measurements, the high-latency warning SHALL be emitted if and only if 5 or more consecutive measurements exceed 3000ms. The counter SHALL reset to 0 when a measurement ≤ 3000ms is observed.
Validates: Requirements 7.4
For any platform identifier ('darwin', 'win32', 'linux'), the platform mapping functions SHALL return: (1) the correct config directory path, (2) the correct driver implementation class, and (3) keyboard shortcuts with Cmd substituted for Ctrl on 'darwin' and Ctrl preserved on other platforms.
Validates: Requirements 9.5, 9.10, 10.4
| Category | Examples | Strategy |
|---|---|---|
| Driver errors | Installation failure, driver crash, device disappearance | Display OS error code, offer retry, fall back to passthrough |
| Audio device errors | Mic disconnected, permission denied, format mismatch | Pause session, notify user, attempt fallback to default device |
| Pipeline service errors | WebSocket disconnect, API timeout, rate limiting | Reuse existing degradation cascade (full → text-only → transcription-only → passthrough) |
| Encryption errors | Corrupt settings file, key derivation failure | Reset to defaults, prompt for API key re-entry |
| IPC errors | Malformed message, renderer crash | Log and reject invalid messages; restart renderer if crashed |
stateDiagram-v2
[*] --> Healthy
Healthy --> ServiceDegraded: WebSocket disconnect
ServiceDegraded --> Healthy: Reconnect success
ServiceDegraded --> FurtherDegraded: Second service fails
FurtherDegraded --> Passthrough: All services down
Passthrough --> Healthy: Services recover
Healthy --> MicError: Mic disconnected
MicError --> Healthy: Fallback to default mic
MicError --> SessionPaused: No mic available
Healthy --> DriverError: Virtual mic driver crash
DriverError --> SessionPaused: Notify user
SessionPaused --> Healthy: User restarts session
The panic stop executes a deterministic cleanup sequence reusing the existing executeCleanupSequence:
- Stop audio capture (N-API
stopCapture) - Disconnect STT WebSocket
- Destroy translation engine
- Disconnect TTS WebSocket
- Write silence to virtual mic
- Clear all in-memory transcripts and audio buffers
- Clear latency monitor
- Emit
session:state-changedwithactive: false
Each step is try/catch wrapped — failures are logged but don't block subsequent steps.
The project uses fast-check (already a devDependency) for property-based testing with vitest as the test runner.
Configuration:
- Minimum 100 iterations per property test
- Each test tagged with:
Feature: desktop-app-rewrite, Property {N}: {title} - Tests run via
vitest run(not watch mode)
Property tests cover:
- Audio chunking invariant (Property 1)
- TTS resampling correctness (Property 2)
- Audio routing state machine (Property 3)
- VAD state transitions (Property 4)
- Noise gate filtering (Property 5)
- Echo cancellation state machine (Property 6)
- Degradation level computation (Property 7)
- Backpressure queue limits (Property 8)
- Volume normalization (Property 9)
- Latency color mapping (Property 10)
- Language filtering (Property 11)
- Encryption round-trip (Property 12)
- Settings persistence round-trip (Property 13)
- Settings export excludes secrets (Property 14)
- Settings import validation (Property 15)
- Settings migration (Property 16)
- IPC message validation (Property 17)
- Demo voice-time tracking (Property 18)
- Debug log ring buffer (Property 19)
- Consecutive high-latency detection (Property 20)
- Platform mapping (Property 21)
- Driver version compatibility check (1.10)
- Driver installation error display (1.7)
- Default mic fallback when none selected (2.3)
- Ghost Mode threshold and gain values (2.10)
- Barge-in timing (4.5)
- Tray icon state mapping (5.2)
- Connection status indicator text (5.7)
- Degradation level display text (5.8)
- Default voice fallback (6.7)
- Cache TTL staleness (6.4)
- Auto-start default off (8.1)
- Transcript clearing on session end (10.2)
- Panic stop cleanup sequence (10.7)
- Demo mode activation logic (11.1, 11.2, 11.5, 11.6, 11.7)
- Settings change immediate effect (12.9)
- Driver installation/uninstallation per platform (1.2–1.6, 1.8, 1.9)
- Audio device enumeration (2.1, 2.2)
- Mic disconnection recovery (2.7)
- Virtual mic write latency benchmark (4.2)
- Passthrough latency benchmark (4.6)
- Driver failure detection timing (4.8)
- Auto-start registration per platform (8.2–8.4, 8.7)
- Settings file atomic write (12.7, 12.8)
- Driver presence check on startup (1.1, 8.6)
- No audio files written to disk (10.1)
- No telemetry or tracking (10.5, 10.6)
- API keys not exposed to renderer (10.8)
- No plaintext demo keys in source (11.10)
desktop/
├── src/
│ ├── main/ # Electron main process
│ ├── renderer/ # Preact UI
│ ├── native/ # N-API addon (Rust)
│ └── shared/ # Reused pipeline modules
└── tests/
├── properties/ # Property-based tests (fast-check)
├── unit/ # Example-based unit tests
├── integration/ # Platform-specific integration tests
└── smoke/ # Smoke tests