Transformer & Optimization Based High Altitude GPS-Denied Fusion

1. SIFT-Based Prototype

In my original GSOC post, I described a proof-of-concept NGPS pipeline that relied on SIFT feature matching against a stitched map. While functional, it generally fails under varying lighting and texture conditions, requires existing database, and its update rate oscillated between 0.5 Hz and 3 Hz.

2. Transformer-Based VPS & VIO

To improve, I replaced SIFT with a learned matching approach. I extract sparse features using SuperPoint, then feed them into a lightweight transformer model that dynamically matches descriptors at inference, achieving high rate output. I paired it with a VIO back-end. This provided local poses at 10 Hz and higher as configured and data inputs.

3. EKF Extension Attempt

Before moving to global optimization, I tried to extend the ArduPilot’s AP_NavEKF3 to fuse both VIO poses and coarse VPS estimates. I modified the EKF update step and made a drop-state measurement model, but the added covariance propagation and measurement updates increased processing and memory usage and triggered watchdog errors. I’m sure @rmackay9 sir might have an explanation for this. I suspect my own code changes introduced some inefficiencies.
In any case, I didn’t work on it further as I was eager to implement offboard estimation.

4. IMU + Optical-Flow + VPS: Optimization-Based Fusion

So finally I resorted to fusing all constraints in a two-stage optimization pipeline using the modified VINS algorithm: one for a local VIO step using High-Res IMU at 100hz + Kanade-Lucas-Tomasi sparse optical flow with output rate of around 20hz and another global optimization step using VPS. The target 4DOF state vector (position, orientation(yaw), and velocity) is optimized over IMU preintegration residuals, optical-flow reprojection errors, and a global optimization problem in the second stage. The optimizer in a fixed-size sliding window provided estimates rates of ≄ 20 Hz on an i7 CPU and RTX 3060 GPU, and tested >10 Hz on Jetson Orin NX 8 GB (inference only).
Altitude and yaw are handled by barometer and compass estimates from the onboard EKF, respectively.


Estimated UAV trajectory at 150 m altitude overlaid on satellite imagery. Left: raw VPS. Middle: VPS + moving average. Right: VPS + Gaussian averaging.

final.gif
Downward-looking camera footage (Ɨ4 speed) used for VPS map matching at 150 m altitude flight.

Click on this to redirect to the Raw VPS estimation inferencing visualization video
Screenshot from 2025-05-12 02-34-08

5. UKF Prototype vs. Direct Optimization

I also prototyped an Unscented Kalman Filter to fuse VIO and VPS. Although, as expected, the UKF provided pretty good estimates but I resorted to optimization based fusion as … I just wanted to do so : ) The measurement model I designed for UKF is a simple drop state model, considering the input is already compatible with states.

6. Testings and Development

All components run as ROS 2 nodes: fusion_node for VIO, vit_matcher_node for VPS, and ngps_fusion_node for the global optimizer. I resorted to use MAVROS initially as messaging protocol but in final version will shift to native AP DDS. I simulated everything on Gazebo Harmonic and ROS Humble.

Simulation results: ground truth vs. raw VPS vs. VPS fused with VIO.
A 3-point triangle trajectory simulated in gazebo

The ROS2 bag file for this tests is around 14GBs. I am still figuring out a way to share that;

7. Calibrations

Mainly for hardware, I measured IMU noise and bias and used Kalibr to find the camera-IMU time offset and extrinsics.

8. Hardware Trials

For hardware trials, thanks to @kkouer for helping me with actual flight tests to test the algorithm and providing the dataset for training and optimizations. We are still in process of conducting a safe online HITL tests.

Also @ppoirier has been a huge help all along : )

9. Code & Release Plan

I’m still refactoring code into clean, modular ROS 2 packages and solving some bugs along as I conduct tests. It’s a mess now because of experimentation and development. I’ll push the standalone UKF module along with it. All repositories will be open-sourced in the coming weeks after hardware tests are verified.

I have already open sourced my colab notebooks that contains crude implementation of the VPS algorithm. You can find it on my existing ap_ngps repo.

10. Future Plan

This project exceeded than originally planned timeline because I underestimated the work involved : )

I am planning to use a loop closure algorithm based on traditional descriptors and BoW in future with improved frame alignment and interpolated correction. The main reason for not using loop closure in current implementation is that at high altitudes, the accumulated drift can reach tens of meters compared to in centimeters in case of indoor positioning and estimation. The loop closure could cause large discontinuities in the AP’s EKF if fed directly and sure it won’t be good for the drone. To avoid those jumps, a simple transformation + interpolation, followed by a simple low pass filter might help in blending overtime.

Apart from this, I am confident that using SFM, we can build a full fledged SLAM with a larger baseline stereo for mapping purposes with higher precision.

Most of the existing solutions out there have some drift involved in their non-GPS positioning but I aim to nullify that and make it really robust. It still have a lot of work required to make it as I want but it’s really a learning roller coaster.

Thanks : )

16 Likes

Really well done ! Congrats that is a huge and trending tasks !

I am really eager to see it on live demo !

1 Like

It’s a very arduous task, and I’m very honored to be able to provide a little assistance.

2 Likes

Thanks a lot @khancyr! Will be doing the live hardware run in the coming weeks : )

2 Likes

great to hear there is lots of progress!
looking forward for later iterations where a optimized matcher could work on embedded friendly target like a RPI :slight_smile:

2 Likes

Hi @snktshrma, thank you for sharing this excellent blog. It’s a commendable work. My research team is currently developing a similar system as part of our ongoing research, and you have done a great work for ardupilot. If possible, consider publishing a research paper on this. It would definitely add value

2 Likes

Hi @snktshrma ,congrats! I noticed you modified vins to fuse optical flow and IMU data,(4Dof,xyz and yaw i guess?),and I take a deep look into vins ,IMU has 6 types of data(gyro 3,acc 3)which represents(3 angular data YPR,3 position data XYZ),but optical flow you used to use only can represent 4 data(xy,yaw , z u used barometers),so do you mean we actually give up Roll and Pitch?

1 Like

Hi @JR_C ! Thanks
So no, you don’t give up roll and pitch. Algorithm still pre-integrate gyro rates to estimate full 3DOF attitude and the accel derived gravity vector anchors roll and pitch and then optical flow (and the baro for altitude) constraints the x-y estimates and yaw drift. I chose to output only estimated pose and yaw and feed into AP’s EKF. As the position and attitude are coupled states, EKF’s correction step refines the roll and pitch as well. So you still estimate and correct the full pose, including roll and pitch.
I hope it makes sense : )

so,the whole structure is still vins(but use optical + imu )-> produce pose and yaw → ekf ,is this right?many thanks!

Essentially yes @JR_C . For this specific use case, vins has been adapted with modifications to the objective function and the huber loss as there are some assumptions I took for the current use case of high altitude non-GPS + there’s an external VPS. I’ll share the detailed changes once the current tests are complete.

Thanks for the sharing!By the way,have you
tried only use VPS and send the pos to EKF(since vps can calculate x,y and yaw too).and I want to know the structure you used above aims to higher the frequency of the output,right?

so I wonder if i only use vps,whats the lowest frequency i should send to the ekf?i just dont know how to calculate,does the apm mention it on their webpage?

Yes I tried and results were not that good. In short, main reason were high covariance and fluctuating data frequency.
The structure I used helps with high frequency data as well as a more stable pose estimates.

So it should be at least 3-4hz for EKF to accept the estimates. The data lag that is currently configurable is only upto 250ms but you can obviously increase it to cater for low rate data and it will linearize and extrapolate the position but the overall innovation will increase for fast maneuvers or very non-linear motions

1 Like

IIRC EKF starts complaining when GPS (absolute position source) goes below 5Hz.

Thanks!Does it has any accuracy requirements?like compared to real position,position error under 5meters?

Thanks a lot! :grinning_face:It’s great help!

Not to true global values (as they aren’t known :smile: ) but if your position estimates aren’t consistent (accounting for variances) with other sensors then it will throw a fit.