Skip to content

SuyashMullick/real-time-vision-latency-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

End-to-End Vision Latency Benchmark (GPU/CPU)

Real-Time Ready License: MIT

A high-performance C++20 perception pipeline designed for high-fidelity "glass-to-command" latency benchmarking. This project implements a deterministic vision loop optimized for real-time UAV tracking and drone-based multi-object tracking (MOT).

Annotated UAV Tracking


1. Project Overview

This repository provides a minimal yet robust perception stack capable of processing high-resolution video streams with deterministic timing. It is engineered to bridge the gap between AI inference benchmarks and physical hardware control loops.

Key Features

  • GPU Acceleration: Built-in support for NVIDIA CUDA via ONNX Runtime Execution Providers.
  • Lock-Free Architecture: Thread communication over 64-byte aligned Single-Producer, Single-Consumer (SPSC) ring buffers to eliminate mutex-based jitter.
  • Real-Time Determinism: Validated p99 latency consistency, suitable for High-Rate Flight Controllers.
  • Embedded Portability: 100% binary-compatible with NVIDIA Jetson (Orin/Xavier) platforms.

2. Technical Architecture

The pipeline consists of four dedicated POSIX threads operating in a zero-copy asynchronous chain:

graph LR
    A[Capture Stage<br/>OpenCV V4L2] -->|SPSC Queue| B[Preprocess Stage<br/>CHW / Normalization]
    B -->|SPSC Queue| C[Inference Stage<br/>YOLOv8 CUDA]
    C -->|SPSC Queue| D[Track & Command<br/>SORT / Centroid Error]
    
    subgraph "Latency Budget (p50)"
    B --- B1[3.9ms]
    C --- C1[9.4ms]
    D --- D1[1.3ms]
    end
Loading

3. Real-Time Performance & UAV Readiness

The pipeline has been rigorously tested using the VisDrone Dataset (UAV image sequences converted to video). On standard workstation GPU hardware, the system achieves Strictly Real-Time performance metrics:

Performance Metrics (GPU Run)

Metric Result Benchmark Context
Throughput 64.04 FPS 6.4x faster than source requirements
Frame Drop Rate 0% 146/146 frames processed without loss
Global Latency (p50) 81.2 ms Buffer-stable end-to-end chain
Inference Time (p50) 9.4 ms Sub-10ms YOLOv8 evaluation

Latency Jitter & Determinism

As shown below, the pipeline maintains a highly stable end-to-end latency, which is critical for PID control loops in UAV flight stability.

Latency Jitter Stage Breakdown


4. Setup & Reproducibility

Prerequisites

Installation (GPU-Enabled)

  1. Build the Container:

    docker build -t latency_bench_gpu -f docker/Dockerfile.gpu .
  2. Execute Benchmark:

    docker run --gpus all --rm -v $(pwd):/workspace -w /workspace latency_bench_gpu \
        ./build/latency_bench --video data/uav_test.mp4 --out uav_gpu_eval --duration_sec 0 --conf 0.15

Installation (CPU-Only)

For non-NVIDIA hardware, use the standard Dockerfile:

./docker/run.sh

5. Dataset Attribution

The evaluation video sample (data/uav_test.mp4) is derived from the VisDrone Dataset (AISKYEYE Team). The original image sequences were transcoded into a 10 FPS MP4 container to simulate realistic UAV sensor ingestion.


6. Optimization for Flight

For real-world UAV deployment where lowest-age-of-data is prioritized over throughput:

  • Reduce the ring buffer depth to --buffers 3.
  • Enable thread affinity pinning with --pin.
  • Enable real-time scheduling with --realtime (requires sudo permissions).

About

High-performance C++20 perception pipeline for real-time latency benchmarking. Features lock-free SPSC ring buffers, zero-copy memory, and GPU-accelerated (CUDA) YOLOv8 tracking for real-time UAV applications.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors