EKF3 position still going mad in beta5 - drone crashed

About one month ago, I described a situation where EKF3 position (vertical and horizontal) was going mad here: EKF3 Position going mad - Bug in 4.1.0-beta3 or defective Cube Orange?
My hope, that this was fixed with beta5 could not be confirmed. Perhaps the “collapsing baro variance” was fixed but the weird position estimation after a lane switch still happened. Unfortunately this time it happened after getting airborne and the drone crashed.

Configuration also here was using “GPS for Yaw” (RTCM proxied through the Pixhawk Orange).

First, the drone should take off and ascend to 10 m. But right after spinning up the motors and leaving the ground, the copter shot to the right and climbed with high speed. Some seconds later it staggered somehow and was intercepted by a tree.

Comparing the horizontal POS estimation (red) with the GPS track (blue), you see an extremely oscillating EKF3 position estimation:

Looking at the lat and lon values of GPS and POS you also see the differences and the oscillations in POS which after about 5 minutes recovered again:

The phase with the weird POS values was between two EKF lane switches, marked in the following diagram at 10:31:38.417 and 10:36:18.409:

The lane switches followed the errors “EKF_PRIMARY-2” and “EKF_PRIMARY-1” also logged at 10:31:38.417 and 10:36:18.409.

Looking at the recorded altitudes, you can see a big difference between a successful flight about 10 minutes before …

and the crash flight:

Here, at the moment of the “EKF3 lane switch 2” the estimated vertical position (POS.alt) jumped down from about 61 to 0 meters (relative altitude from 0 to -61 m).

If you zoom out, you can see that the POS.alt is continously going down to -1019 meters until the next EKF3 lane switch also recovers this error.

Taking a more detailed look at the Acc, Gyro, Baro, GPS and Mag data of the redundant sensors, you can see that they are all almost consistent. Only the mag data of the internal compass drifted away a certain amount from the mag data of the external compasses at take-off due to the high current deforming the magnetic field near the power cables (and close to the Pixhawk Cube Orange). This may have been the cause for the “EKF3 lane switch 2” at take-off. But it does NOT explain, why the POS estimations of the new primary line produced these weird oscillating values letting the drone go mad after the lane switch.

You can find the log-file here: https://kopterkraft.com/downloads-static/00000001.BIN

One week earlier, I also had these weird POS values (oscillating horizontal position and continuously decreasing altitude) during a certain phase, where the drone was just sitting on the ground:

In that log file you also find several EKF3 lane switches but in that case, the last switch was to lane 1 before the POS values started going mad. So I see no pattern. In that case, the vertical position estimation reached already -3197 meters before I disconnected the power.

The log file can be downloaded here:
https://kopterkraft.com/downloads-static/2021-07-07%2012-06-56.bin

A certain part of that log, also LOG_REPLAY was set to 1, if this helps…

@tridge, @priseborough,

I wonder if either of you could have a look at this? The oscillation in the POS message doesn’t look good.

It selected the EKF for IMU3 (lane index 2) shortly after the second takeoff.

The IMU3 EKF became unhealthy whilst sitting on the ground disarmed before the second takeoff - see vertical position innovation for the IMU3 EKF constantly increasing:

See the vertical velocity innvoation for the IMU3 EKF has a constant offset:

The covariance matrix appears to have developed a numerical fault that resulted in it cyclically reporting low normalised innovations. Without the pre-arm and replay logging we will be unable to be sure of the root cause, but it appears to be related to the stability issue that was fixed by EKF3: Covariance stability improvement by priseborough · Pull Request #18008 · ArduPilot/ardupilot · GitHub

The reason the IMU3 EKF was selected was that both the first and second IMU EKF’s reported a transient increase in normalised GPS velocity innovations

whereas the faulty third IMU EKF did not:

I’ve looked for the reason for the normalised velocity innovations in the IMU1 and 2 EKF’s reporting high

Both GPs units report an increase in velocity error after takeoff, but for the second GPs it is particularly high:

Comparing the vertical speeds for the two receivers shows significant inconsistency:

Both the IMU1 and IMU2 EKF’s show large vertical velocity innovation transients with no matching transient in the vertical position innovation. This combined with the GPS reported rise in velocity error (GPA.SAcc) makes it mostly like that a GPS glitch affected both the IMU1 and 2 EKF’s which combined with the fault in the IMU3 EKF resulted in a lane switch.

The EKF’s were using GPS1 at the time of the switch.

The yaw angle estimate looks OK - both IMU1 and 2 EKF’s agree with the converged GSF yaw once the vehicle starts accelerating:

@Andrew_Tridgell We need to review the EKF health reporting and selection logic together.

3 Likes

Hello everyone,

Not sure if related since it has been quite a while and it was with version 4.07, but what @Hacky experienced was similar to what happened to us. We were flying in altitude hold mode with the EKF2. As source for correction for yaw, not altitude, we used the Intel T265 camera. I know in the old version of Arducopter, by default if there were external navigation it was fused instead of the barometer. We changed that part of the code for forcing it to always fuse the barometer since the Z position of the camera can cause problems.

However, we were aware that the position in X and Y was still fused and actually it was able to make the first lane unhealthy. Thus, the algorithm switch to the lane of the second IMU since we did not enabled the third one as shown below:

As it can be seen below, the altitude between lanes was very different:

The result was that the drone jumped a few feet from switching EK2 (see log) and continued to roll right into a crash in a treeline. The pilot could not do anything at all and the landing mode engaged by the pilot did not make a difference. We noticed that the jump in altitude ended up in a commanded desired roll of -13 degrees; the maximum leaning angle was set to 10 deg:

Please find the log in the link below:

We did not make a big deal out of this, so we decided to just have one IMU enabled. This was the second crashed we had because of the lane switching. Again, probably this is not related, but what @Hacky described sounds as something similar. Same on this flight which was more scary since it was a flight away. Though on this one we did not see the jump in the desired pitch, the drone was uncontrollable:


@priseborough
Thank you for your detailed examination.

I suggest, you also take a look into the second log from 07th July that I linked as well at the end of the first post. Here the drone was just sitting on the ground and also there happened several lane switches. Some time after the last lane switch, also the POS estimations began “floating”. In that case LOG_DISARMED was 1 and at least part time also LOG_REPLAY. I activated this after I noticed that the problem was arising again.

I will try to generate further logs with activated LOG_DISARMED and LOG_REPLAY. Yesterday, the drone was standing about one hour on the ground but in that case there was no lane switch and no floating POS estimates. I do not really see a dependency but yesterday it was only about 22°C and on the other days it was around 30°C.

Just throwing out some ideas… Is that a wind turbine generator you flew near? Could that have affected GPS signal, maybe caused multipaths or blocked a sat or two that was in view? I usually configure my GPS to have a minimum satellite elevation (I think 15deg) as I take off near a powered electric fence, solar array, and power lines.

I don’t know if this is anything but for some reason in the log in the OP the GPA[0] SAcc looks like it stops calculating that value and the graph just connects the points with straight lines.

With all the flying we’ve done around utility assets I can’t imagine it making any meaningful difference in GPS reception. We regularly fly around high-kv lines without issue.

1 Like

Good morning,

The IMU can be affected by changes in temperature. The cubes has an internal temperature control that can heat the IMU, not cool it, using the power dissipated by some resistors:
image
In your logs, definitely the temperature of your IMUs did not match the setpoint in BRD_IMU_TARGTEMP that is 45:


You may want to try the calibration procedure here if you notice a failure pattern correlated with the temperature:
https://ardupilot.org/copter/docs/common-imutempcal.html

Yea, I do too. I hit a high voltage transmission tower line one time and right up until contact there was no consequence.

2 Likes

Just out of curiosity, what happened during and after contact?

It was flying a Mission and one prop blade broke sending it down into some trees. I have told a funny story about the incident because the tree was on some property where they had Peafowl and it was dark by time I found it. Those birds can wake the dead with their call. It’s better than a guard dog :smiley:

Nice! I’ve heard that some prop/motor/drone combinations can actually adapt and continue flying with 1 blade missing from a motor as long as they still have the other one or more on that motor. It’ll really give the bearings a workout, but supposedly it works for short periods of time to get to the ground.

The log showed some pretty crazy pitch and roll but actually a not so controlled descent that didn’t do anymore damage. The tree might have helped that, it was stuck up in it.

@sbaccam
Yes, the temperature was one of my thoughts as well (as written earlier). It was a pretty warm day.

But the IMU temperature still was below 60°C and quite similar to the first flight (may be 2 - 3 °C difference only).

I guess, there are quite a lot builds out there, where the Cube together with other electronics sits inside a seperate enclosure and this happened in Germany where we still have moderate tempereatures even in summer compared to other countries.

And it also does not explain the strong oscillations in the POS estimation as the IMU input data still looked pretty normal even in that phase.

Something else worth noting is that using GSF for yaw whilst disarmed for longer periods on the ground relies on the on ground not moving check to stop the yaw drifting and with no logging when disarmed, I cannot determine if this functionality is working as expected.

The Copter 4.1 beta6 updates have improved the robustness of the EKF3 calculations to numerical rounding error. A log with LOG_DISARMED=1 and LOG_REPLAY=1 would be required to determine with certainty that your issue was resolved by these changes.

1 Like