EKF3 altitude bug with visual odometry

I’m running a 4.2.3 firmware version on multiple drones to test different visual odometry technologies feeding the EKF3 for position and speed in xy while keeping the traditional barometer or down-facing lidar as main altitude source beside IMUs.
Mostly everything is working good but it happened few times that when the drone took off after it was landed or was flying very low (below 50cm from ground) the EKF3 completely got crazy estimating an exponentially decreasing altitude: that causes the drone to fly very fast and indefinitely towards the sky if in althold or loiter mode (the first time that happened I was unprepared and I needed to crash the drone not to loose it completely :pensive:).

Looking into the logs I find out that the EK3 altitude curve was very similar (even if specular) to the log of odometry VISP PZ which apparently also got crazy. But how is it possible that the EKF3 altitude failed in the same way even if all the sensors selected for its estimation (barometer, lidar and IMU) were fine and healthy?

Here is the more interesting log I have in my hand showing the ALT exponentially decreasing curve at the end of the flight when I reduced the quote below 30 cm in loiter mode (as you can see this time I immediately stopped it with my hand and killed it): altitude_bug_with_visual_odometry.bin

I am also facing the same issue, can anyone from the developer team provide help?

@N0L12 thanks for reporting this!
what we need is an EKF replay log to find the issue.
can you possibly reproduce with LOG_REPLAY=1 and LOG_DISARMED=1?
with a replay log we can usually find the issue, and even better we can confirm the fix actually works by replaying the flight through the EKF

@tridge well received. I will keep those 2 parameters enabled hoping the bug to happen again soon…

@tridge finally it happened again. As you can see at min 5:55 of the attached log the filter altitude estimation CTUN Alt start to diverge till -2m while rangefinder and barometer are correctly estimating it (drone was all the time hovering between 0.2 and 0.8 meters from ground): VIO_bug_with_logreplay.BIN

After more than 10s seems like the EK3 acknowledges something got wrong and resets itself but it cannot stop to diverge.
FYI as you can see from the sensors when I saw the failure happening I kept the drone from flying in the sky with my hand before killing the motors after about 30s.

This particular time the VIO seems to stop (or at least stopped logging) at the beginning of the altitude failure, which I understand it can be annoying to the EK3 especially because it is the main source not only for xy position but also for the yaw. What it does surprise me is that the filter not only doesn’t recognize the failure but also cannot recover from it that in any way.

I kindly wait for your expert comments from your side…

1 Like

I have experienced this as well over time working with t265.

In the end the workaround we did is to detect jumps, or anomalously high changes in speed/position, and when this happens just stop feeding EK3 with viso data, small code workaround.

I think over time there have been several efforts of looking into this, some fixes to EK3_SRC, but for us the problem persisted so we did that workaround listed above in order to leave the system reliable enough.

I would love to help with the logs, but it has been a long time since I am not in contact with that project, I don’t have the setup to test, and I barely remember the details, but wanted to mention that we have experienced it as well, in several AP versions.

@N0L12 thanks for the replay log. The log confirms the bug is there in 4.2.x, but is fixed in 4.3.x (I tested 4.3.7).

@tridge that is very good news! I will try the 4.3.7 asap…

about this subject , can you change VISO parameters in replay ? I have some issue with the speed XY estimation while using external position input.

Sorry for the delay. I don’t understand which VISO parameter do you want me to change and test, can you detail?

I am trying to change VISO_POS_M_NSE and VISO_SCALE , but when I try to update it in REPLAY , it doesn’t let me.

I found this error in 4.3.7 too.