Master's Thesis Project — Full-stack autonomous UAV landing system on a moving UGV, built on PX4 SITL, ROS 2 Humble, and Gazebo Garden. Combines multi-scale AprilTag perception, multi-sensor EKF state estimation, and FOV-constrained linear MPC, orchestrated through a reactive Behavior Tree.
straight_line_traj_landing.webm
This repository presents a complete vision-based autonomous landing system for a quadrotor UAV on a dynamically moving ground vehicle (UGV). The system integrates perception, state estimation, and constrained optimal control within a unified PX4–ROS 2–Gazebo simulation framework.
Unlike static landing systems, this work addresses the significantly harder problem of dynamic target tracking, where the UAV must continuously estimate, predict, and track a moving platform under perception uncertainty and camera field-of-view (FOV) constraints.
The system combines:
- Multi-scale AprilTag bundle detection for robust perception
- Extended Kalman Filter (EKF) for target state estimation
- Relative-state formulation for control
- Model Predictive Control (MPC) with visibility constraints
- End-to-end UAV–UGV autonomous landing pipeline
- Multi-scale AprilTag bundle for robust detection across altitude
- Rover-side EKF for global state estimation
- Relative state control formulation
- MPC with field-of-view (FOV) constraints
- Full integration with PX4 SITL, ROS 2, and Gazebo
px4_vision_landing_moving_target
├── analysis
│ ├── Plot1_Trajectory.png
│ ├── Plot2_VelocityTracking.png
│ ├── Plot3_LandingFunnel.png
│ ├── Plot4_FilterPerformance.png
│ ├── Plot5_TrackingErrors.png
│ └── plot_rover_validation.py
├── images
│ ├── MPC.png
│ ├── tag_bundle_default_view.png
│ ├── tag_bundle_topview.png
│ └── tf_tree.png
├── launch
├── config
├── px4_vision_landing_moving_target
├── models
├── worlds
├── README.md
└── setup.py
- Moving target tracking using vision + EKF.
- Relative pose estimation in global frame.
- MPC-based trajectory generation and control.
- Autonomous mission execution.
- Real-world deployment (hardware validation).
- Adverse weather / lighting conditions.
- Multi-agent coordination.
- Obstacle avoidance.
- AprilTag bundle is rigidly mounted on UGV.
- Camera calibration is perfect (simulation).
- Communication between UAV and rover is reliable.
- No external disturbances (wind, sensor bias).
flowchart TD
A[PX4 SITL + Gazebo]
B[Camera Sensor]
C[AprilTag Detection]
D[TF Transform System]
E[Global Tag Pose Estimation]
F[EKF on Rover]
G[Relative State Computation]
H[MPC Controller]
I[PX4 Offboard Control]
J[Behavior Tree]
K[Logging & Analysis]
A --> B --> C --> D --> E
E --> G
F --> G
G --> H --> I --> A
J --> H
E --> K
F --> K
| # | Contribution | Technical Detail |
|---|---|---|
| 1 | Multi-scale AprilTag bundle | Tags 11 (150mm), 19 (50mm), 23 (400mm) at fixed offsets — ensures detection from ≥8m down to touchdown |
| 2 | Multi-sensor rover EKF | Fuses GPS position, wheel odometry velocity, IMU yaw rate, and AprilTag vision corrections |
| 3 | FOV-constrained linear MPC | CVXPY/OSQP solver with soft FOV constraints, jerk penalty, and dynamic safety margin |
| 4 | Dual-mode control handover | Seamless GPS-chase → vision-active transition with timer-based fallback |
| 5 | Reactive Behavior Tree | py_trees with battery failsafe, tag-loss timeout, and RTL abort |
| 6 | Full PX4 SITL integration | NED↔ENU frame handling, offboard heartbeat, PX4 command abstraction |
A single AprilTag has a fundamental limitation: detection range scales linearly with tag size. A tag large enough to detect at 8m altitude becomes unusable for precise landing at 0.5m. Conversely, a small tag sized for precision is invisible at altitude.
The solution is a multi-scale bundle — three tags of different sizes rigidly mounted on the rover, treated as a single logical frame by apriltag_ros. The bundle provides:
- Continuous observability across all flight phases (high altitude to touchdown)
- Graceful redundancy — any single tag detection is sufficient to keep the bundle visible
- Pose accuracy improvement near landing — the smaller tag dominates at close range, reducing pose noise
- No blind spot during the critical descent transition
# config/apriltag.yaml
tag_bundles:
landing_pad:
- id: 23, size: 0.40m, offset: +0.15m # Large — long-range detection
- id: 11, size: 0.15m, offset: -0.10m # Medium — mid-range tracking
- id: 19, size: 0.05m, offset: 0.00m # Small — high-precision landingThe large 400mm tag (ID 23) is reliably detected at 6–8m altitude. As the drone descends below ~2m, the 50mm tag (ID 19) dominates the pose estimate with sub-centimeter accuracy.
| Default View | Top View |
|---|---|
![]() |
![]() |
Correct frame management is critical for this system. All coordinate transforms are handled through ROS 2 tf2, and the entire pipeline is built on two conventions:
- ROS / ENU (East-North-Up): used for all internal ROS topics, the
mapframe, and control inputs - PX4 / NED (North-East-Down): used for all PX4 messages (
/fmu/in/*,/fmu/out/*)
All nodes that interface with PX4 perform explicit NED↔ENU conversion.
| Frame | Description | Source |
|---|---|---|
map |
Global ENU reference frame | PX4 |
base_link |
UAV body frame | PX4 |
camera_link |
Downward-facing camera frame | Static TF |
tag_bundle |
AprilTag bundle frame | Detection |
ugv_base |
Ground vehicle frame | EKF |
X_enu = Y_ned (North → Y)
Y_enu = X_ned (East → X)
Z_enu = -Z_ned (Down → Up, negated)
This conversion is applied explicitly in
OffboardExperimentManager,RoverRelativeStateNode,VisionGuidanceController, andTFStateLogger.
For a moving target, instantaneous pose measurements are insufficient for control. The MPC requires smooth, continuous estimates of the rover's position, velocity, and heading in the global frame. Raw GPS alone is too noisy (±0.5–1.5m) and provides no velocity. Wheel odometry drifts over time. Vision (AprilTag) is intermittently unavailable. A multi-sensor EKF fuses all three to produce a reliable state estimate at 20Hz.
The EKF models the UGV using a constant velocity model. The EKF maintains a 5-dimensional state:
where
where:
and process noise:
The measurement consists of the detected tag pose:
where:
and:
The EKF runs three separate update steps per cycle, each with its own measurement model and noise covariance:
| Sensor | Measurement | Noise (R diagonal) | Update Rate |
|---|---|---|---|
| GPS | [x, y] position (ENU) |
[1.5, 1.5] m² |
~10Hz async |
| Wheel Odometry | [vx, vy] velocity (ENU) |
[0.1, 0.1] (m/s)² |
~20Hz async |
| IMU | yaw rate (control input) | Q[4,4] = 0.05 |
20Hz prediction |
| AprilTag vision | [x, y] position + yaw (ENU) |
[0.05, 0.05] m² |
intermittent |
Vision updates are the highest-trust measurement and significantly correct accumulated GPS drift when the tag is visible. Yaw is initialized from GPS course-over-ground (requires >15cm motion) and continuously fused with vision-derived heading.
- Publishes:
/rover/ekf_odom(nav_msgs/Odometry) at 20Hz - Broadcasts TF:
map → r1_rover/base_link - Uses
ReentrantCallbackGroupto allow parallel sensor callbacks without blocking the prediction timer
- EKF provides velocity estimates, which are critical for prediction
- Reduces high-frequency noise in AprilTag detection
- Enables smooth and stable control inputs
Without EKF, the system suffers from:
- Noisy control commands
- Poor tracking of moving targets
- Instability in descent phase
The control problem is formulated in terms of relative state between UAV and UGV:
The goal is:
Instead of chasing a moving target globally, the UAV:
Stabilizes the relative state to zero
This simplifies the problem from:
- Tracking a moving reference
to:
- Regulating a system to equilibrium
- Simplifies controller design
- Naturally handles target motion
- Improves stability and convergence
- Decouples global motion from control
Traditional controllers (PID) fail because:
- They cannot handle constraints
- They do not predict future motion
- They struggle with moving targets
MPC solves these issues using optimization-based control.
The MPC minimizes:
The MPC is formulated as a quadratic program (QP) solved at 20Hz using CVXPY with the OSQP backend. The state is the 2D relative position of the drone with respect to the tag, and the control input is the 2D horizontal velocity command.
State:
Input:
Dynamics:
The tag velocity
| Weight | Value | Purpose |
|---|---|---|
Q |
14 × I₂ |
Position error — pull drone toward tag |
R |
8 × I₂ |
Velocity effort — command drone to match tag speed |
R_diff |
4 × I₂ |
Jerk penalty — smooth acceleration profile |
W_slack |
1000 |
FOV constraint violation — soft enforcement |
Hard — velocity saturation:
Soft — dynamic FOV footprint:
where the FOV bound adapts with altitude:
This ensures the tag remains inside the camera frame at all altitudes. The soft slack variable
Once landing is triggered via the /landing/trigger_landing ROS 2 service, the controller enters a funnel descent mode:
- The lookahead time decreases linearly with altitude:
$t_{la} = 0.5 \cdot (z - z_{td})$ - The allowed horizontal error shrinks with altitude:
$\epsilon_{max} = \max(0.2,\ 0.4 \cdot z)$ - Descent only proceeds when
$|p^{rel}| < \epsilon_{max}$ — ensuring precision at touchdown - Touchdown is declared when $z \leq 0.65$m and $|p^{rel}| < 0.15$m, and final accuracy is logged in millimeters
Independent of MPC, the yaw is controlled to align the drone with the rover's velocity vector:
-
Rover moving (
$|v_{tag}| > 0.15$ m/s): $\dot{\psi}{cmd} = \text{clip}(0.6 \cdot \psi{err},\ \pm0.4\ \text{rad/s})$ - Rover stationary: look-at control based on tag body-frame angle
The mission is orchestrated by a py_trees Behavior Tree running inside OffboardExperimentManager at 20Hz. The BT is tick-based and reactive — every tick re-evaluates conditions, enabling immediate response to sensor events like tag loss or battery drop.
Unlike finite state machines, BTs allow:
- Better scalability
- Parallel condition monitoring
- Robust recovery from failures
The landing mission is decomposed into the following high-level stages:
| Node | Type | Returns SUCCESS when | Returns FAILURE when |
|---|---|---|---|
SetupOffboard |
Action | Armed and OFFBOARD mode active | — |
Takeoff |
Action | |z - target| < 0.3m |
— |
IsBatteryHealthy |
Condition | Battery > 15% | Battery ≤ 15% |
IsTagVisible |
Condition | /landing/tag_visible_flag is True |
Tag not visible |
VisionActiveMonitor |
Action | external_mission_state == "LANDED" |
Tag lost > tag_loss_timeout seconds |
SearchChase |
Action | Never (always RUNNING) | — |
DisarmAndLand |
Action | Always (triggers PX4 land mode) | — |
AbortMission |
Action | Never (always RUNNING after trigger) | — |
- The system starts with takeoff to a predefined altitude
- The perception pipeline activates and searches for the tag bundle
- Once detected, EKF tracking is initialized
- MPC takes over for trajectory tracking
- Controlled descent begins while maintaining visibility
- Final landing is executed when alignment criteria are met
-
Reactivity
System can respond to loss of detection -
Modularity
Each behavior is independently defined -
Robustness
Failure in one node does not crash the entire system
- Ubuntu 22.04
- ROS 2 Humble
- PX4-Autopilot
- Gazebo (Ignition) Garden
- Python 3.8+
mkdir -p ~/px4_ros2_ws/src
cd ~/px4_ros2_ws/src
# Clone this repository
git clone https://github.com/09priyamgupta/px4_vision_landing_moving_target.git
# Clone the custom PX4-Gazebo Bridge
git clone https://github.com/09priyamgupta/px4_gz_bridge.git
# Build the workspace
cd ~/px4_ros2_ws
colcon build --symlink-install
source install/setup.bash
This project relies on several external Python libraries for optimization, math, and mission management. Install them using pip and apt:
pip install cvxpy py_trees scipy transforms3d
sudo apt-get install ros-humble-tf-transformations
This project uses custom Gazebo models and a custom world, which must be manually installed into the PX4 Gazebo directory.
Copy the models/ and worlds/ folders from this repository into your PX4 installation:
# Copy models
cp -r ~/px4_ros2_ws/src/px4_vision_landing_moving_target/models/* ~/PX4-Autopilot/Tools/simulation/gz/models/
# Copy worlds
cp -r ~/px4_ros2_ws/src/px4_vision_landing_moving_target/worlds/* ~/PX4-Autopilot/Tools/simulation/gz/worlds/
Important: PX4 will not detect the custom AprilTag bundle model or the custom moving-target world unless these files are placed in the Gazebo search path shown above.
All results were generated from Gazebo simulation with the rover driving a straight-line trajectory at approximately 0.5 m/s. Logs are recorded by TFStateLogger to CSV and post-processed with analysis/plot_rover_validation.py.
The drone successfully converges from its takeoff position to the moving rover and maintains tight spatial alignment throughout the descent. The GPS-chase phase (before tag acquisition) shows larger positional offset; once vision takes over, the lateral error drops significantly.
Drone velocity closely tracks rover velocity in both North and East components. The EMA filter (α = 0.15) on the KF velocity output effectively suppresses high-frequency jumps while preserving the low-frequency trend needed for feedforward compensation.
The allowed horizontal error (orange) shrinks with altitude, forming a cone of acceptance. The actual error (blue) remains within the cone throughout descent, confirming that the funnel constraint is satisfied and the landing sequence proceeds correctly.
Comparison of raw GPS position measurements against EKF-filtered estimates. The EKF produces significantly smoother position and velocity estimates, which directly reduces oscillatory behavior in the MPC output.
Horizontal tracking error over time from tag acquisition to touchdown. Errors converge monotonically with no divergence or instability, confirming the effectiveness of the relative state formulation and FOV-constrained MPC.
| Decision | Justification | Impact |
|---|---|---|
| EKF on rover | Needed velocity estimation | Enables prediction |
| MPC over PID | Handles constraints explicitly | Stable tracking |
| Multi-scale tags | Prevents detection loss | Robust perception |
| Relative state control | Simplifies control problem | Faster convergence |
| Behavior Tree | Modular mission logic | Robust execution |
The system requires six terminals running simultaneously. Start them in order — each subsequent terminal depends on the previous ones being ready.
The DDS agent bridges PX4 uORB messages to ROS 2 topics. This must be running before PX4 starts.
cd ~/PX4-Autopilot
MicroXRCEAgent udp4 -p 8888Wait for the agent to print Running before proceeding.
Launches PX4 in software-in-the-loop mode with the custom world and x500 drone with downward-facing camera.
cd ~/PX4-Autopilot
export PX4_GZ_WORLD=px4_vision_landing_moving_target
make px4_sitl gz_x500_mono_cam_downWait for Gazebo to fully load and PX4 to print [commander] Ready for takeoff! before proceeding.
Provides real-time telemetry, arming status, battery level, and manual override capability.
./QGroundControl.AppImageQGC will auto-connect to PX4 SITL via UDP.
Starts the topic bridges that relay camera images, GPS, IMU, and odometry from Gazebo to ROS 2.
cd ~/px4_ros2_ws
source install/setup.bash
ros2 launch px4_gz_bridge start_bridges.launch.pyVerify that /x500/camera/image_raw and /r1/gps/fix topics are publishing:
ros2 topic list | grep -E "camera|gps"rviz2 -d ~/px4_ros2_ws/src/px4_vision_landing_moving_target/apriltag_land.rvizThe provided .rviz config includes: camera feed, drone marker, rover marker, AprilTag cube, MPC predicted path, rover predicted path, FOV footprint, and drone trajectory trail.
Launches all perception, state estimation, control, and mission management nodes.
cd ~/px4_ros2_ws
source install/setup.bash
ros2 launch px4_vision_landing_moving_target landing.launch.pyOnce the drone is tracking the tag (mission state = VISION_ACTIVE), trigger the descent via the ROS 2 service:
ros2 service call /landing/trigger_landing std_srvs/srv/Trigger {}The drone will begin the funnel descent and automatically disarm on touchdown.
ros2 launch px4_vision_landing_moving_target landing.launch.py \
target_altitude:=4.0 \
tag_loss_timeout:=2.0 \
mpc_horizon:=20 \
control_rate:=20.0| Argument | Default | Description |
|---|---|---|
target_altitude |
4.0 |
Takeoff and GPS-chase altitude in meters |
tag_loss_timeout |
2.0 |
Seconds before declaring tag lost and reverting to GPS chase |
mpc_horizon |
20 |
Number of MPC prediction steps |
control_rate |
20.0 |
Control loop frequency in Hz |
All nodes load frame names from this single file. To adapt the system to a different robot or sensor setup, only this file needs to be modified.
frames:
world: map
drone: base_link
camera: x500_mono_cam_down_0/camera_link/imager
tag: landing_pad # Must match the bundle name in apriltag.yaml
rover_odom: r1_rover/odom
rover_base: r1_rover/base_link
rover_relative: rover_relativefamily: "36h11"
tag_bundles:
bundle_names: ["landing_pad"]
landing_pad:
ids: [19, 23, 11]
19: { size: 0.05, transform: [0.00, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] } # Small
23: { size: 0.40, transform: [0.15, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] } # Large
11: { size: 0.15, transform: [-0.10, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0] } # MediumQ = np.diag([0.01, 0.01, 0.1, 0.1, 0.01]) # Process noise: [x, y, vx, vy, yaw]
R_gps = np.diag([1.5, 1.5]) # GPS measurement noise (m²)
R_odom = np.diag([0.1, 0.1]) # Odometry velocity noise ((m/s)²)Q = np.eye(2) * 14.0 # Position tracking weight
R = np.eye(2) * 8.0 # Velocity effort weight (match tag speed)
R_diff = np.eye(2) * 4.0 # Jerk penalty (smooth braking)
W_slack = 1000.0 # FOV soft constraint penalty
max_vel = 1.0 # m/s velocity saturation- Simulation only. All results are from Gazebo SITL. Hardware validation has not been performed.
- Ideal sensor models. Gazebo GPS and IMU noise does not fully replicate real-world sensor characteristics.
- Linear MPC. The current formulation uses a linear drone dynamics model. Aggressive maneuvers or strong wind disturbances would benefit from a nonlinear MPC.
- No obstacle avoidance. The system assumes a clear operational volume between the drone and rover.
- Static camera calibration. In-flight vibration and lens distortion are not modelled.
- Nonlinear MPC implementation
- Adaptive EKF tuning
- Improved perception robustness
- Real-world hardware deployment
- Sensor fusion (IMU + vision + GPS)
- Robust detection under varying conditions
- Multi-UAV cooperative landing
- Dynamic target interception
- Deployment in real-world applications (logistics, defense, etc.)
Autonomous landing on a moving target is fundamentally a:
Prediction + Estimation + Constraint Satisfaction problem
This work demonstrates that combining:
- Robust perception (AprilTag bundle)
- State estimation (EKF)
- Relative control formulation
- Constraint-aware optimal control (MPC)
enables reliable and stable landing on dynamic platforms.
If you find this work useful in your research or projects, please consider citing it:
@misc{gupta2026autonomouslanding,
author = {Gupta, Priyam},
title = {Vision-Based Autonomous Landing on a Moving Target using PX4, ROS 2, and MPC},
year = {2026},
publisher = {GitHub},
howpublished = {\url{[https://github.com/09priyamgupta/px4_vision_landing_moving_target](https://github.com/09priyamgupta/px4_vision_landing_moving_target)}},
note = {GitHub repository}
}Priyam Gupta
This project is licensed under the MIT License. See LICENSE for details.







