A high-performance C++20 perception pipeline designed for high-fidelity "glass-to-command" latency benchmarking. This project implements a deterministic vision loop optimized for real-time UAV tracking and drone-based multi-object tracking (MOT).
This repository provides a minimal yet robust perception stack capable of processing high-resolution video streams with deterministic timing. It is engineered to bridge the gap between AI inference benchmarks and physical hardware control loops.
- GPU Acceleration: Built-in support for NVIDIA CUDA via ONNX Runtime Execution Providers.
- Lock-Free Architecture: Thread communication over 64-byte aligned Single-Producer, Single-Consumer (SPSC) ring buffers to eliminate mutex-based jitter.
- Real-Time Determinism: Validated p99 latency consistency, suitable for High-Rate Flight Controllers.
- Embedded Portability: 100% binary-compatible with NVIDIA Jetson (Orin/Xavier) platforms.
The pipeline consists of four dedicated POSIX threads operating in a zero-copy asynchronous chain:
graph LR
A[Capture Stage<br/>OpenCV V4L2] -->|SPSC Queue| B[Preprocess Stage<br/>CHW / Normalization]
B -->|SPSC Queue| C[Inference Stage<br/>YOLOv8 CUDA]
C -->|SPSC Queue| D[Track & Command<br/>SORT / Centroid Error]
subgraph "Latency Budget (p50)"
B --- B1[3.9ms]
C --- C1[9.4ms]
D --- D1[1.3ms]
end
The pipeline has been rigorously tested using the VisDrone Dataset (UAV image sequences converted to video). On standard workstation GPU hardware, the system achieves Strictly Real-Time performance metrics:
| Metric | Result | Benchmark Context |
|---|---|---|
| Throughput | 64.04 FPS | 6.4x faster than source requirements |
| Frame Drop Rate | 0% | 146/146 frames processed without loss |
| Global Latency (p50) | 81.2 ms | Buffer-stable end-to-end chain |
| Inference Time (p50) | 9.4 ms | Sub-10ms YOLOv8 evaluation |
As shown below, the pipeline maintains a highly stable end-to-end latency, which is critical for PID control loops in UAV flight stability.
- Host OS: Ubuntu 22.04+ (Docker-only build)
- GPU Hardware: NVIDIA GPU with NVIDIA Container Toolkit installed.
-
Build the Container:
docker build -t latency_bench_gpu -f docker/Dockerfile.gpu . -
Execute Benchmark:
docker run --gpus all --rm -v $(pwd):/workspace -w /workspace latency_bench_gpu \ ./build/latency_bench --video data/uav_test.mp4 --out uav_gpu_eval --duration_sec 0 --conf 0.15
For non-NVIDIA hardware, use the standard Dockerfile:
./docker/run.shThe evaluation video sample (data/uav_test.mp4) is derived from the VisDrone Dataset (AISKYEYE Team). The original image sequences were transcoded into a 10 FPS MP4 container to simulate realistic UAV sensor ingestion.
For real-world UAV deployment where lowest-age-of-data is prioritized over throughput:
- Reduce the ring buffer depth to
--buffers 3. - Enable thread affinity pinning with
--pin. - Enable real-time scheduling with
--realtime(requires sudo permissions).


