[Log Analysis] Questions about expected behavior on compass failure


We had a crash today with one of our spray drones (running v3.6.9) which was primarily due to a compass failure. I’ve attached the log at the bottom of this post. Here is my analysis:

  • The copter took-off fine and was enroute to it’s start point.
  • Soon afterwards, MAG1 starts experiencing heavy glitching. At first this triggers a couple of YAW_RESETS. MAG2 reads fine the whole flight.
  • Eventually, MAG1 is so degraded that the EKF failsafe triggers and the copter enters LAND mode.
  • Unfortunately, by this point the copter was traveling at 12m/s and only ~4m off the ground, leading to a very high speed land that causes a crash.

I had a couple of questions about this:

  1. We use 2 HERE2s as our compasses/GPSs. Why is it that upon failure of the primary compass, the copter did not switch to the other compass?
  2. How/why did the copter reach a groundspeed of 12m/s when the WPNAV_SPEED was set to 6m/s?

I am frankly terrified by this event because it seems like compass redundancy basically did not work, which defeats half the purpose of having 2 HERE2s on the copter.

Copter config:

  • Hexacopter running CubeBlack, Copter v3.6.9
  • 2 HERE2 GNSS (latest firmware used from CubePilot) on I2C. We cannot update to CAN mode right now.
  • Both compasses/GPSs are enabled, internal MAG is disabled.

Let me know if you have any questions.

Thank you for the assistance.

U021_1568211258.bin (380 KB)


1 Like

Looked at the log. The EKF did not seem to respond correctly to the bad mag readings:

Here, you can see the two mags are consistent with each other, except for the occasional bad value on mag 1. Unfortunately, for some reason, the EKF was somehow tricked into thinking that the bad spiked value was the “real” reading to be trusted, and the consistent real value was wrong. This is evidenced by the magX innovation increasing (and yaw direction changing) immediately following the spike.

So basically, the problems are:

  1. A single bad value confused the EKF
  2. Both EKF’s were using the same mag, so switching EKFs wouldn’t have changed anything (and it failsafed instead of switching)
  3. The second mag was never used

@rmackay9 Are 2. and 3. expected behavior? The EKF didn’t try switching lanes, and are there provisions for magnetometer failover or using different mags on different ekfs? Also, point 1. looks disturbingly like the flyaway that was caused by the EKF being fooled by a single lidar reading. Worth bringing to Paul’s attention?

1 Like

This does look like a bug. It didn’t switch to the 2nd compass as that compass also had very high innovations relative to what it thought the current yaw was. The key issue is that a single bad reading appears to have triggered a yaw reset to the wrong yaw. Once that yaw reset happened then both compasses had high innovations and would not be selected.
This PR would fix the issue:

but I think we should also make the yaw reset logic more resilient to glitches.
I also wonder why the glitches happened at all. I haven’t seen glitches like that on the Here2 compass previously.

Ah, yes, that makes sense.

Looks like this PR would help with switching EKF cores, but what about the underlying issue where the EKF accepted a glitch value that was greater than its I gate? (well, I think so, the consistency check was 1.7. Shouldn’t it have rejected the reading, making lane changing unnecessary?)