Z controller altitude inconsistent with EKF altitude

I have multiple copters running 3.6.9 (Cube Black) that are running into a strange problem with altitude control. Basically, the EKF altitude estimate is correct, but it somehow disagrees with the altitude logged by the position controller in CTUN and POS messages. Unfortunately, those are the values which are ultimately used for control, so the drone does not maintain altitude correctly. Here is an example:

Here, the EKF vertical position PD (I inverted it) agrees with the barometer and the rangefinder. However, the Z controller is using the altitude reported in CTUN.Alt or POS.Alt, which diverge severely from the EKF altitude. The rangefinder can be used as a source of truth in this case.

This is in contrast to a typical flight, which looks like this.

Here, the EKF, CTUN, and POS altitudes are all right on top of each other. This is normal and makes sense, because the controller is supposed to be using the EKF estimate, and these values are all being logged from the same source (see below). Worth noting that this flight is from the same drone a few days earlier with the same parameters.

Ultimately, altitude is derived from climb rate, so just as a demonstration, here is the first flight again with climbrates and vertical velocities graphed:

The CTUN climb rate quickly diverges from the EKF vertical velocity. The GPS vertical velocity was not being used by the EKF at the time (EK2_GPS_TYPE = 1), so the GPS can be used as a third source in this case, and it agrees with the EKF.

So, how did the Z controller diverge from the EKF? I can’t figure it out. According to the source code, the logged values come from the same place:

NKF2.PD just logs the value returned by that function directly: https://github.com/ArduPilot/ardupilot/blob/Copter-3.6/libraries/DataFlash/LogFile.cpp#L909

For CTUN.Alt, that same value goes through a frame and units conversion here: https://github.com/ArduPilot/ardupilot/blob/Copter-3.6/libraries/AP_InertialNav/AP_InertialNav_NavEKF.cpp#L36-L37

then gets passed unmodified by https://github.com/ArduPilot/ardupilot/blob/Copter-3.6/libraries/AP_InertialNav/AP_InertialNav_NavEKF.cpp#L115 for logging.

I can’t find any opportunity for EKF and CTUN logged values to disagree - it’s supposed to be the same number! Looking for some technical help because this is driving me (and my drones) nuts. @rmackay9 I hope you don’t mind the tag, I’m not sure who would know best what is going on here.

Here is the log for the bad flight graphed above.
log62 inconsistent ekf ctun alt.bin (632 KB)

I used a spreadsheet to integrate the IMU1 Z accelerometer values to see if it agreed with the CTUN or EKF climb rate. It agreed with the EKF. Still trying to figure out why the Z controller climb rate is going crazy.

I figured it out:

  • It was using the third EKF core on IMU 3 (NKF4.PI = 2). The third core is not logged so I couldn’t see what was going on
  • IMU 3 was getting bad readings, causing bad climb rate estimation
  • It was using the third core because Copter 3.6.9 does not switch back to the first one while disarmed (3.6.10 does). This also explains why the problem is intermittent.

Here’s IMU1 compared to IMU3. IMU3 is getting lots of weird negative spikes. Could be vibration because IMU3 is undamped, but I’ll need to do an FFT to determine that because the VIBE logs don’t look at IMU3.

And here’s how I verified that it was the problem without having an EKF log. I integrated the Z accel readings for each IMU and compared them to CTUN.Crt (which was using EKF#3) and NKF1.VD.

IMU1 and 2 are consistent with EKF1, but IMU3 goes nuts. The EKF was probably rejecting some of the noisy readings, which is my integration diverges even more than CTUN.Crt.

So the only remaining mystery is why I have bad IMU3 readings on multiple drones.