End-to-End Vision Latency Benchmark (GPU/CPU)

A high-performance C++20 perception pipeline designed for high-fidelity "glass-to-command" latency benchmarking. This project implements a deterministic vision loop optimized for real-time UAV tracking and drone-based multi-object tracking (MOT).

1. Project Overview

This repository provides a minimal yet robust perception stack capable of processing high-resolution video streams with deterministic timing. It is engineered to bridge the gap between AI inference benchmarks and physical hardware control loops.

Key Features

GPU Acceleration: Built-in support for NVIDIA CUDA via ONNX Runtime Execution Providers.
Lock-Free Architecture: Thread communication over 64-byte aligned Single-Producer, Single-Consumer (SPSC) ring buffers to eliminate mutex-based jitter.
Real-Time Determinism: Validated p99 latency consistency, suitable for High-Rate Flight Controllers.
Embedded Portability: 100% binary-compatible with NVIDIA Jetson (Orin/Xavier) platforms.

2. Technical Architecture

The pipeline consists of four dedicated POSIX threads operating in a zero-copy asynchronous chain:

graph LR
    A[Capture Stage<br/>OpenCV V4L2] -->|SPSC Queue| B[Preprocess Stage<br/>CHW / Normalization]
    B -->|SPSC Queue| C[Inference Stage<br/>YOLOv8 CUDA]
    C -->|SPSC Queue| D[Track & Command<br/>SORT / Centroid Error]
    
    subgraph "Latency Budget (p50)"
    B --- B1[3.9ms]
    C --- C1[9.4ms]
    D --- D1[1.3ms]
    end

3. Real-Time Performance & UAV Readiness

The pipeline has been rigorously tested using the VisDrone Dataset (UAV image sequences converted to video). On standard workstation GPU hardware, the system achieves Strictly Real-Time performance metrics:

Performance Metrics (GPU Run)

Metric	Result	Benchmark Context
Throughput	64.04 FPS	6.4x faster than source requirements
Frame Drop Rate	0%	146/146 frames processed without loss
Global Latency (p50)	81.2 ms	Buffer-stable end-to-end chain
Inference Time (p50)	9.4 ms	Sub-10ms YOLOv8 evaluation

Latency Jitter & Determinism

As shown below, the pipeline maintains a highly stable end-to-end latency, which is critical for PID control loops in UAV flight stability.

4. Setup & Reproducibility

Prerequisites

Host OS: Ubuntu 22.04+ (Docker-only build)
GPU Hardware: NVIDIA GPU with NVIDIA Container Toolkit installed.

Installation (GPU-Enabled)

Build the Container:

docker build -t latency_bench_gpu -f docker/Dockerfile.gpu .

Execute Benchmark:

docker run --gpus all --rm -v $(pwd):/workspace -w /workspace latency_bench_gpu \
    ./build/latency_bench --video data/uav_test.mp4 --out uav_gpu_eval --duration_sec 0 --conf 0.15

Installation (CPU-Only)

For non-NVIDIA hardware, use the standard Dockerfile:

./docker/run.sh

5. Dataset Attribution

The evaluation video sample (data/uav_test.mp4) is derived from the VisDrone Dataset (AISKYEYE Team). The original image sequences were transcoded into a 10 FPS MP4 container to simulate realistic UAV sensor ingestion.

6. Optimization for Flight

For real-world UAV deployment where lowest-age-of-data is prioritized over throughput:

Reduce the ring buffer depth to --buffers 3.
Enable thread affinity pinning with --pin.
Enable real-time scheduling with --realtime (requires sudo permissions).

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
docker		docker
docs		docs
results		results
scripts		scripts
src		src
third_party		third_party
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
annotated_uav_gpu_eval.gif		annotated_uav_gpu_eval.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Vision Latency Benchmark (GPU/CPU)

1. Project Overview

Key Features

2. Technical Architecture

3. Real-Time Performance & UAV Readiness

Performance Metrics (GPU Run)

Latency Jitter & Determinism

4. Setup & Reproducibility

Prerequisites

Installation (GPU-Enabled)

Installation (CPU-Only)

5. Dataset Attribution

6. Optimization for Flight

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

End-to-End Vision Latency Benchmark (GPU/CPU)

1. Project Overview

Key Features

2. Technical Architecture

3. Real-Time Performance & UAV Readiness

Performance Metrics (GPU Run)

Latency Jitter & Determinism

4. Setup & Reproducibility

Prerequisites

Installation (GPU-Enabled)

Installation (CPU-Only)

5. Dataset Attribution

6. Optimization for Flight

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages