OpenZoom is a Windows-only camera magnifier built around Qt 6, Media Foundation, Direct3D 12, and an optional CUDA processing path. The current codebase already supports live camera capture, CPU and GPU frame processing, a two-stage preset/advanced UI, rotation-aware presentation, persistent settings, photo snapshots, and H.264 MP4 recording.
- Media Foundation camera enumeration with per-device mode listing (
width x height @ fps). - CPU frame pipeline for format conversion, rotation, black-and-white thresholding, zoom, Gaussian blur, temporal smoothing, and debug compositing.
- Direct3D 12 presenter for swap-chain output plus GPU texture readback for recording and snapshots.
- CUDA external-memory interop path with black-and-white, zoom, Gaussian blur, temporal smoothing, focus marker, and spatial sharpening via NVIDIA NIS or AMD FSR 1.0 style kernels.
- Stage-1 quick modes backed by full stage-2 advanced configurations, including promotion of advanced tuning into user-defined quick options.
- CPU fallback when CUDA interop is unavailable or debug view is enabled.
- OCR via
tesseract.exe, VLM via an OpenAI-compatible HTTP endpoint, and an in-app assistive overlay. - Session persistence in
%APPDATA%\OpenZoom\OpenZoom\settings.json. - Processed output capture to
output/img/IMG_*.jpgandoutput/vid/VID_*.mp4next to the executable.
OpenZoom is no longer just a CPU-only shell. The CPU path is the most deterministic debugging path, while the CUDA path is active when the D3D12/CUDA interop surface initializes successfully. When GPU processing cannot be used, the app falls back to the CPU pipeline automatically and exposes that state in the UI.
- Windows 10 or Windows 11.
- Visual Studio 2022 with the Desktop C++ workload and Windows SDK.
- Qt 6.9.3 for
msvc2022_64, or matching overrides viaQT_PREFIX/Qt6_DIR. - CMake 3.23 or newer.
- NVIDIA GPU plus CUDA Toolkit 13.x if you want the CUDA path.
From a Visual Studio x64 developer prompt or a PowerShell 7 session (pwsh.exe) with MSVC, Qt, and optionally CUDA on PATH:
scripts\build_and_run.batFrom the WSL/Linux agent shell, run Windows-side commands through PowerShell 7
with pwsh.exe -NoProfile -Command '...', for example
pwsh.exe -NoProfile -Command 'Get-Date'.
The helper script:
- configures
build\with the Visual Studio 2022 generator, - clears stale CMake cache entries when the source path changes,
- builds
open_zoom, - prefers
build\cmake\Release\open_zoom.exewhen launching, - runs
windeployqtautomatically when it can find the Qt runtime.
If Windows reports missing Qt6*.dll files, add the Qt bin directory to PATH or point QT_PREFIX / Qt6_DIR at the correct installation.
cmake --preset msvc-release
cmake --build --preset msvc-release-buildAvailable presets live in cmake/CMakePresets.json:
msvc-debugmsvc-releasemsvc-cpu
cmake -S . -B build-cpu -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="C:/Qt/6.9.3/msvc2022_64" -DOPENZOOM_ENABLE_CUDA=OFF
cmake --build build-cpuscripts\build_release_bundle.batThis produces dist\OpenZoom\ with open_zoom.exe, Qt runtime files, LICENSE, README.txt, and THIRD_PARTY_LICENSES.md. CUDA redistributables are copied when they are available on the machine.
Quick Modesis the primary stage-1 UI: choose task-oriented presets like reading, high contrast, OCR assist, or scene explain.Advanced Tuningopens the stage-2 panel with the full set of low-level controls.Save As Quick Optionpromotes the current advanced setup into a reusable stage-1 preset.Cameraselects the active Media Foundation device.Modesshows the discovered capture modes for the selected camera.Rotationrotates the pipeline in 90 degree clockwise steps before downstream processing.Black & Whiteapplies thresholded monochrome conversion.Zoomenables the magnifier and focus-point controls.Gaussian Blurapplies the CPU or CUDA blur stage with configurable sigma and supported discrete radii.Temporal Smoothapplies an exponential running average.OCR Assist,Scene Explain, andAssistive Overlaydrive asynchronous assistive analysis and on-screen text overlays.Spatial Sharpenenables the CUDA sharpening/upscaling stage and lets you choose NIS or FSR-style processing when the GPU path is active.Debug Viewswitches to the CPU composite grid so intermediate stages can be inspected.Show Focus Pointoverlays the current zoom center on the presented output.Capture Photosaves the current processed frame tooutput/img/.Start Recordingwrites processed video tooutput/vid/as H.264 MP4, with a 12 hour cap per file.
Ctrl + mouse wheel: zoom around the cursor.Mouse wheel: pan while zoomed.Middle mouse drag: pan the zoom focus.- Arrow keys: nudge the zoom focus.
Virtual Joystick: shows an on-canvas joystick overlay for panning.
- Settings persist to
%APPDATA%\OpenZoom\OpenZoom\settings.json, including the selected quick mode, current advanced configuration, and user-created quick options. - Snapshots are written under
output/img/relative to the executable. - Recordings are written under
output/vid/relative to the executable.
- OCR uses
tesseract.exe. If it is not onPATH, setOPENZOOM_TESSERACT_PATH. - VLM mode uses an OpenAI-compatible
chat/completionsendpoint. - Configure VLM with:
OPENZOOM_VLM_API_URLOPENZOOM_VLM_API_KEYOPENZOOM_VLM_MODEL- optional
OPENZOOM_VLM_PROMPT
src/app/- application lifecycle, settings persistence, and interaction wiring.src/capture/- Media Foundation camera discovery and capture loop.src/common/- CPU frame pipeline, image processing helpers, and Media Foundation recording wrapper.src/d3d12/- Direct3D 12 presenter, swap chain, upload, and readback logic.src/cuda/- CUDA interop surface and kernels.src/ui/- Qt widgets, overlays, and event routing.include/openzoom/- public headers mirroring the source layout.docs/- architecture notes, code reference, progress tracking, and licensing docs.scripts/- build, bundle, and validation helpers.
docs/README.mdfor the architecture overview.docs/code_reference.mdfor the current class and file map.docs/hardcoded_paths.mdfor machine-specific defaults.docs/progress.mdfor implementation status.docs/THIRD_PARTY_LICENSES.mdfor redistribution notes.
This project is dual-licensed:
- GPL-3.0 via
LICENSE - Commercial licensing by direct arrangement with the project owner
By contributing, you agree that your changes may be distributed under both terms. Third-party notices are summarized in docs/THIRD_PARTY_LICENSES.md.