EKF3 Position going mad - Bug in 4.1.0-beta3 or defective Cube Orange?

Hacky · June 11, 2021, 7:05pm

On my last flight with ArduCopter 4.1.0-beta3 running on Pixhawk Cube Orange, I had a strange behaviour. Fortunately no crash but propably critical.

It began with an EKF3 lane switch, after the “canonical” altitude behaved strange while the copter was in a descent phase. Baro and GPS altitude changed as expected but POS.Alt (the canonical altitude) did not change. You can see it marked here:
grafik

Magnified:
grafik

The copter software seemed to realize this discrepancy which resulted in an EKF3 lane switch. You can see it here at 11:59.937:
grafik

As this happened during a guided mission, I decided to terminate the mission, switch back to PosHold and after that return with RTL mode.

After the vehicle safely landed, things got even worse. Horizontal position was jumping on the Mission Planner map accompanied by messages like “PreArm: GPS and AHRS differ by 32.0m” (with varying distance).

Here you can see a screen recording:

If you look at the POS.Lat, POS.Lng and POS.Alt values you can see these going mad:
grafik
Altitude already varied between -5769.93 and +717.38 meters…

The canonical position deviation propably was caused by ATT.Pitch and ATT.Roll going mad as well after landing:

The copter was sitting on good leveled pavement during the whole phase after landing and was not touched. If you look at the Gyro data from all three IMUs, there is no reason visible why ATT.roll and ATT.pitch drifted away.

Fortunately the very strange things only happend on ground after landing but the POS.Alt issue and the EKF3 lane switch in flight were already frightening enough.

So the question is:
Was this caused by a defective Cube Orange or by bugs in Copter 4.1.0-beta3. I am using the “GPS for Yaw” feature but I do not see a direct dependency.
Wheather was sunny and warm (about 27°C air temperature). The barometer temperature was around 55°C during that phase. In my opinion nothing scary (Pixhawk autopilots seem to work at operating temperatures up to 85°C.

You can download the whole logfile here: https://kopterkraft.com/downloads-static/2021-06-09%2011-55-03.bin

FRED_GOEDDERT · June 13, 2021, 12:25am

I have a small Quad-copter and just switched to 4.1.0 Beta3. Because of my eye problem I did a tiny hover behind the house (narrow place) to see how good it is. In Loiter it suddenly shot up a couple of metres on its own and I landed immediately.

I did run the .tlog file in MissionPlanner and could see that the velocity indicator was building up slowly and when it went orange that’s when EKF-message was in MissionPlanner-Hub: EKF3 IMU0 in-flight yaw alignment complete and view seconds later EKF primery changed:1. Also the Position(Vert) was increasing but did not get orange.
The quad has one of the early pixracer installed. I knew the microSD card is old and not working anymore. That’s all what I can tell you.

Hacky · June 13, 2021, 9:22am

Thanks for your description. I also saw vertical position changes and increasing vertical speed sometimes in phases, where the drone was sitting on the table in my office (with no GPS reception). I do not exactly remember the messages but I think it was something like “Error pos vert variance”.

@rmackay9:
Can you give me a hint, how the EKF “vert pos” is estimated and what could propably lead to these errors? GPS seems not to be responsible here - at least on my workshop table, where I have no GPS. As external magnetometer I used a LIS3MDL first, later an IST8310 but meanwhile “GPS for Yaw”.

FRED_GOEDDERT · June 13, 2021, 10:26am

I forgot to mentioned during that 5 minute hover I had 15-17 Satellites, ndop:0.6-0.8. GPS was really good. That burst of about 2m up was only for fraction of a second and I was be able to land in Loiter quickly. That also was my first attempt after FW update from 4.0.7

tridge · June 16, 2021, 4:02am

what is happening is the barometer variances are collapsing to zero:

It is probably related to the larger value of EK3_ALT_M_NSE that has been set (4 instead of the usual 2), but even with that change we shouldn’t be seeing this.
@Hacky if you do any more testing then please set LOG_DISARMED=1 and LOG_REPLAY=1. That will give us a “replay” log, which would really help in fixing this issue.
Meanwhile, I’ll discuss with @priseborough

tridge · June 16, 2021, 5:00am

Interestingly, the touchdown bit goes to 1 just before the problem starts at an altitude of 100m.

priseborough · June 17, 2021, 12:51am

The touchdown bit appears to be coincidental. I’ve submitted a PR https://github.com/ArduPilot/ardupilot/pull/17787 that will make the code more robust to a collapse of the vertical velocity state variance that was the trigger event. The root cause of the vertical velocity variance collapse is still unconfirmed, but one hypothesis is that it is related to fusion of the GPS yaw on the same frame as the GPs position and velocity data and with a small angular accuracy value.

priseborough · June 17, 2021, 1:08am

There is another log shat shows the same pattern - also using moving baseline GPS for yaw:

vertical velocity state variance collapses to 0, then vertical position variance follows:

vertical velocity and position then diverge:

The following steps are required:

Obtain a replayable log that demonstrates the behaviour.
Investigate increasing the minimum value of GPS yaw accuracy used in the fusion operation from the current value of 5 deg
Investigate moving the yaw fusion to a different filter time step to the GPs position and velocity fusion

rmackay9 · June 17, 2021, 8:00am

Great, thanks for this analysis.

I think step 3 is recorded in this GPS-for-yaw improvements issue (see 2nd item) so now we have another reason to do it.

Hacky · June 18, 2021, 10:47am

@rmackay9, @tridge, @priseborough
Thanks for taking care of that issue. As you may have noticed, LOG_DISARMED was set to 0 during the flight and when I saw that strange behaviour on ground, I set it to 1 and this seemed to have an immediate effect (but with a certain gap between landing/disarmed and starting further recording).

Regarding the parameter EK3_ALT_M_NSE: You are right, I have set this to 4 because the drone was way too nervous in keeping the altitude at windy situations (1-2 bft ok but already getting nervous at 3 bft with some gusts - which still is not much). When I used EKF2 with ArduCopter 4.0.7, the value 4 was a good setting for EK2_ALT_M_NSE in my case (where 3 was the default) and the behaviour seemd OK also for with EK3_ALT_M_NSE = 4. Even if it were a bit lax now, I still see no reason why it should be behave like we have seen. IF EK3_ALT_M_NSE should be left at default (2), there would be no way left for me to get a better altitude stability.

Next time I will set LOG_DISARMED = 1 already from the beginning.

Regarding: LOG_REPLAY=1:
How much CPU performance will this eat? As you know, the current implementation with RTCM message traffic between GPS1 and GPS2 going through the flightcontroller (acting as a proxy for “GPS for yaw”), this also already consumes CPU performance. If the amount of logged data increases significantly during flight, I am also a bit anxious that the SD card can not cope with that (it is the card that was included with Cube Orange at delivery) and that I cause trouble only by logging…

tridge · June 21, 2021, 12:36am

very little. The RTCM issue is not about CPU btw, it is about DMA contention between the uarts carrying the RTCM data and the other peripherals. I am separately working on a method to fix that.
Your CPU load in the above log is around 30%, so about 70% of the time the CPU is completely idle. Your uarts are working hard however.

rmackay9 · June 21, 2021, 12:57am

@Hacky,

Re the drone being nervous in keeping altitude (I interpret this as meaning “jumpy”).

It might help to reduce the PSC_ACCZ_P and I values a little more instead of modifying the EKF. I suspect you’ve already seen and followed the altitude hold instructions on the Tuning Process Instructions wiki page but it may help to reduce the values a little more.

It seems like this vehicle is very high powered with MOT_THST_HOVER of only 0.135. So you could reduce PSC_ACCZ_P to 0.15 (ish) and PSC_ACCZ_I to 0.3.

Hacky · June 21, 2021, 9:23am

@rmackay9

Yes, currently the vehicle is a overpowered. It will get a larger battery soon and at that point I will go through the tuning process again.

rmackay9 · June 23, 2021, 4:27am

We’ve just merged this fix which we think will resolve the issue that started this thread. This will be included in Copter-4.1.0-beta5 which will be released within a week or so.

Hacky · June 27, 2021, 6:40pm

Thank you for the info.

Does it mean, I can continue using EK3_ALT_M_NSE = 4 ?

rmackay9 · June 28, 2021, 1:04am

@Hacky,

Yes, I think you can leave EK2_ALT_M_NSE = 4 although I personally wonder if it actually helps …

Hacky · June 28, 2021, 6:59am

With 4.0.7 and EKF2 it helped (EK2_ALT_M_NSE).

It is not the “jumpyness” that you mentioned (which of course could be calmed by PID tuning). It is really a strong dependency to small barometric pressure changes. With EK2_ALT_M_NSE = 4 I got better results, with EK2_ALT_M_NSE = 6 it alrady was too much dependent to GPS altitude variance (which may be ok, if you have “rtk fixed” status permanently).

But here we talk about EKF3 and I do not know, how much it differs from the EKF2 behaviour.

rmackay9 · June 28, 2021, 7:21am

@Hacky,

OK. Just one thing to remember is that the EKx_ALT_M_NSE essentially controls the balance between IMU and whatever the EK3_SRC1_POSZ has been set to. So by default it would control the balance between IMU and barometer. It would not help to control the balance between baro and GPS because the EKF will never fuse altitudes from both of these sensors at the same time.

Hacky · June 28, 2021, 3:18pm

@rmackay9

Thanks for clarification - but in that case you should correct the parameter documentation (https://ardupilot.org/copter/docs/parameters.html). It says:

“This is the RMS value of noise in the altitude measurement. Increasing it reduces the weighting of the baro measurement and will make the filter respond more slowly to baro measurement errors, but will make it more sensitive to GPS and accelerometer errors.”

So it clearly mentions also GPS and not only IMU accelerometer fusion. That was the reason, why I assumed, that also the GPS altitude is fused into the “canonical” altitude. And I am pretty sure, that I saw height changes that followed significantly more the GPS altitude variations, when I was using EK2_ALT_M_NSE = 6 - but still not as strong, as I saw it, when I tested EK2_ALT_SOURCE = 2 (primary source is GPS in that case).

With EKF3, you (still) have EK3_ALT_SOURCE and now also EK3_SRC1_POSZ - which also is quite confusing…

rmackay9 · June 28, 2021, 11:16pm

@Hacky,

I’ll fixup that comment, txs for that.

There’s no longer an EK3_ALT_SOURCE parameter in 4.1 so I wonder a bit where you’re seeing that.