Yaw (and plane) out of control in auto modes (ATT.Yaw)

RogerR · November 12, 2018, 4:23pm

Flew an auto mission the other day and the plane went crazy. After hitting a few waypoints, it erratically headed in the wrong directions for about two minutes. I switched to RTL and the erratic course continued. After a couple more minutes, I switched to FBW-A and landed.

Replaying the mission from the tlog, there are many warnings about compass errors. However,as this graph shows, the two compasses (read and green lines) track each other well and move consistently. But the ATT.Yaw value (yellow line) suddenly jumps about 140 degrees (with an "EKF2 IMU0/1 aligned to GPS velocity message) and stays divergent for about 20 seconds. It then follows the compass heading for ~15 seconds and jumps again (140 degrees with an "EKF2 IMU0/1 switching to Compass 1), diverging for another 20 seconds. The IMU eror messages stop, but sudden divergence from the compasses continue for the rest of the flight.

The log for the flight can be downloaded at https://drive.google.com/open?id=102XR-aZ_ghyLry_TV-1uHo0xI6J0yg80
What should I look at next?

hunt0r · November 12, 2018, 5:24pm

I might expect this behavior if your compasses were not well calibrated. Could that be the problem?

RogerR · November 12, 2018, 6:27pm

I’ve probably flown this mission loop 100s of times in the last 9 months and the calibration for the compasses was previously not a problem. However, it has been about 3 mouths since I flew last. And the environmental conditions were slightly different (45 degrees F vs 80-90 F in the past).

But if this was the result of the compasses being off, I don’t understand the navigation behavior. At the time that things go bad, the plane is headed due south (according to both compasses and ATT.Yaw). A few seconds later, the compasses still say South…but ATT.Yaw suddenly says NNE. Since the plane is in level flight (ATT.roll is zero when the divergence of yaw starts), the ATT.Yaw value appears bad to me.

Do you know why the EKF would switch the IMU yaw alignment to GPS velocity?

RR

RogerR · November 12, 2018, 7:02pm

I’m now reading about EKF innovations (Innovations are the difference between the value predicted, using the IMU data before corrections are applied against the value measured by the sensor.) So I added the Normalized Magnetometer Innovations on top of the previous graph:

I believe that this shows that the IMUs suddenly believed that there was a big errors in the compass data. Since the compass data is stable at this point, the sudden magnitude of the error argues that something dramatic must have changed at this moment in time

RR

hunt0r · November 12, 2018, 7:17pm

Yes, I also see something suspicious. Plot the GPS latitude and longitude (as the GPS sensor reported) vs the POS latitude and longitude. (The POS is calculated from the EKF.)

Look at the disagreement. Something was causing the EKF to diverge away from the GPS, and then “jump” back to it. Does anyone know how to investigate this?

RogerR · November 12, 2018, 8:04pm

Hunt0r,

You’re helping a lot!

It appears likely that the compass errors are erroneously created by the internal flight controller position jumping to an incorrect value. And your charts clearly show that isn’t justified by the GPS values at that time.

Since we are thinking that the compass isn’t the cause of the incorrect POS, and
the GPS isn’t cause of the incorrect POS, it must be something else (or my Pixhawk has an internal hardware problem). Do you know what else goes into changing the POS? Accelerometers, speeds, ???

hunt0r · November 12, 2018, 8:45pm

Yes, lots of sensor information is fed into the EKF. Someone who is more experienced than me in looking at the EKF could probably spot the problem fast.

hunt0r · November 12, 2018, 8:47pm

A small, but important point: I observe the EKF Lat/Lng to slowly diverge from the GPS, and then to “jump” back in agreement. (Not to “jump away” to an incorrect value.)

RogerR · November 13, 2018, 12:47am

You’re probably correct on that, but It diverges fast enough to aggravate me

This happened in my first night flight ever…about 1/4 mile away. So I was extremely happy to be get it back in FBW-A to a successful landing. Especially when I also got an RC failsafe and the lights went out for a couple of seconds.

RR

RogerR · November 15, 2018, 5:17pm

Judging from the time scale on your graphs, it looks like it diverges for a period of about 5 -10 seconds and then comes back to reality in a second or less. I still have no idea.

Since the accelerometers and the flight controller are both on the PixHawk, replacing the PixHawk might help. But I’d still love to hear from someone who knows how the EKF could do this.

RR

RogerR · November 26, 2018, 4:17pm

I’ve gone back and looked at the IMU data and don’t see anything anomalous before the first upset (or after). So, no sudden changes in either compass xyz, GPS lat/long/alt or any axis of the IMU gyros or accelerometers.

There are a couple of spikes in the GPS parameter:GPS.GCrs R just before the time of the first upset. I’m uncertain of the parameter definition or significance. It’s a weak correlation since the upsets continued for a few more minutes and I don’t see any other matching spikes in the GPS.GCrs parameter.

But if the GPS.GCrs parameter is the GPS course and if the EKF uses that GPS message for navigation, I could see it creating some confusion. The following plot shows the period of the first upset (where the red line spikes up). The blue and yellow lines are x and y compass readings. The green line is the GPS.GCrs.

The compass lines are stable prior to the upset. But the green line (GPS.GCrs) changes from 185 to 320 just prior to the upset.

Does the EKF for Arduplane use the GPS heading? Does it need the GPS heading…or could I reprogram my GPS to stop sending the messages with the GPS course (GxVTG, GxRMC)?

RR

RogerR · November 28, 2018, 9:53pm

I’ve buckled down to learn about the EKF data and attempted a new analysis. The tutorial for the EKF at http://ardupilot.org/dev/docs/extended-kalman-filter.html#extended-kalman-filter is very helpful.

EKF3 contains all of the “innovations” for the model. Innovations are the differences between the model and the subsequent sensor data. My first graph plots the EKF3 values for IVN and IVE (GPS velocity vs. model velocity). This looks almost like a square wave with a 12-15 second period during the “upsets” in navigation.

The EKF tutorial notes that “These are an important measure of health for the navigation filter. If you have god quality IMU and GPS data they will be small and around zero” I read this as saying that only the IMU and GPS data are involved here.

My 2nd graph shows IPN and IPE added (GPS position North and East vs model position):

At the beginning of each 12-15 second period, the positions match. But they drift away for the duration of the 12-15 seconds…and then snap back to match. The drift direction matches the direction of the excessive velocity.

Could the compass still be involved. It’s doubtful As shown in the following graph (where I overlay IMX and IMY), there is a gigantic innovation for the magnetic sensors at the beginning of this sequence. But it’s so big that the EKF stops using it for navigation…therefore it can’t be a cause of the subsequent errors:

The next graph, from NFK4 (EKF4) shows SV and SP. These are estimates of GPS velocity and position errors respectively.

Notice that the velocity error occurs and then the position drifts. I conclude that the position drifts off because the velocity was wrong. In other words, the position error is an effect of the velocity error…not the root cause.

If I look at the GPS.GCrs value (and treat it as the GPS course) for the first 10 seconds of this problem, I see impossible values (shown here in red).

The course changes from due south to almost north (330 degrees) in less than 1/4 second. 4 seconds later, it goes from east to north in 1/4 second. It’s a big plane limited to 45 degree bank angle turns, so moves of that magnitude aren’t possible.

Both the GPS and IMU (as part of the PixHawk) share the characteristic of devices that have worked well w/o these errors over the last year. But I can replace the GPS far more easily and it’s less money. So, I’ll try replacing it.

RR

hunt0r · November 29, 2018, 2:23pm

I’m just now getting a chance to read your posts. Great work! I’m not sure if any of this is helpful, but just in case…

Yes, it is. It is the “float ground_course;” member of the AP_GPS class (Line 132 of AP_GPS.h)

Interesting note: It’s calculation may depend on what GPS unit you have. I see that some GPS use atan2(vel.y, vel.x) to calculate from velocity, while others read the information directly from the GPS. (I am not surprised that some GPS units may calculate this internally, and ArduPilot just uses it as-is.)

@RogerR Out of curiosity, what GPS unit are you using? (I don’t see if you’ve ever told us)

I’d like to know the answer to these questions, too. Does anyone know?

I know how EKF’s work, and I think your conclusion of “only IMU and GPS data are involved here” is not correct. (The tutorial comment isn’t meant to be read so precisely… it’s just giving the ‘gist’ of the idea.)

Yes! This corresponds to the behavior I observed above as well with my plots titled “EKF drifts away from GPS and ‘jumps back’”.

I agree. Good conclusion.

Based on your excellent analysis, I’m also persuaded to check if the GPS is doing something suspicious. I’ll take a look and post what I find.

Are any ArduPilot EKF experts following this post? Do you see something we’re missing? I’m not even sure who to tag. @priseborough @WickedShell @tridge

hunt0r · November 29, 2018, 3:26pm

I found an idea which may or may not be relevant: The GPS is (for some reason) reporting Speed and Course which are slightly different than its own reports of Lat/Lng position in time.

Here’s my details:

I took GPS.Lat and GPS.Lng and converted them to positions in meters N and meters E (from the first GPS reading) via a spherical earth model with radius 6.3781*10^6 m. (I’m pretty sure this is the model used inside ArduPilot, too. I’ll verify with a code reference if anyone cares.)
I did a first-order speed and course calculation on this gps_in_NE.

[Details: If (T1,N1,E1) are the first (time, posN, posE) triplet and (T2,N2,E2) are the second, form dT=T2-T1, dN=N2-N1, dE=E2-E1. Then my_speed = sqrt((dN/dT)^2+(dE/dT)^2) and my_course=wrapto360(90-atan2d(dN/dT, dE/dT)).]

Take a look at plots of GPS.Spd vs my_speed, and GPS.GCrs vs my_course:

gps_spd_crs.png1366×689 31.3 KB

You don’t have the benefit of zooming in, but my calculation disagrees with the GPS often by 2[m/s] or more for sustained periods (~10sec) of time. I’m surprised by this. The direction is similarly off by say… 20[deg] sometimes. (Also, I did a separate calculation including altitude-change in the speed calc, making it 3D speed. It did not change the discrepancy so I’m not showing it here.)

I still don’t know if this effect might be enough to disrupt the EKF’s estimate so badly. Could the EKF be confused by receiving slightly inconsistent Speed/Course info from a GPS?

@RogerR A fundamental question… how securely is your GPS attached? If it could “wobble” during flight, and it has built-in IMU’s and Compasses, that might be a problem. But that’s just a wild idea.

RogerR · November 30, 2018, 4:35am

Hunt0r, the GPS is (supposed to be) a UBOX NEO-M8N. I say “supposed to be” since I ordered it from Banggood and I understand that could be some clones from that part of the world. It works pretty well, I routinely get 9-12 satellites in my house and it gets a fix more quickly than any other GPS that I’ve used.

Since the GPS has no other source of data beyond past GPS fixes, it must be calculating the course from the set of previous positions. Arduplane software could do the same calculation itself (from the last few positions) rather than getting the derivation from the GPS. Whether it’s UBOX or ardupilot software, intermittent errors seem odd.

I ordered another GPS unit last night.

Since I’m running EKF2, I should have looked at the tutorial about it at http://ardupilot.org/dev/docs/ekf2-estimation-system.html. One immediately useful data point from this is that I can see the SV and SP (velocity and position error estimates) for both IMUs. And the graph for NKF9 SV and SP matches the last one posted above for NKF4. So, unless the gyros for both IMUs went crazy at once, there’s yet another finger pointing at bad velocity data from the GPS.

Another thing I noticed that’s clear in the graphs you provided on the position variance is that the magnitude of the EKF velocity during each period of divergence closely matches the magnitude of the GPS velocity. It’s only the direction of that divergence that varies between the two. And in most (if not all) of the cases, the EKF model’s error is that it continues to track the previous course too long.

Perhaps the GPS is providing stale GPS.GCrs updates…or ardupilot is processing the updates a bit too late?

RR

RogerR · November 30, 2018, 8:02pm

GPS is stuck to plane with double sided sticky stuff…and it’s about 2" in diameter, so that’s plenty to stick it down really well. Zero chance of wobble independent of plane.

I see evidence in your latest graphs that GPS speed is more of a contributor to the problem than course.

There’s a parameter (EK2_GLITCH_RAD) that can be set in plane to control the maximum radial uncertainty in position between the value predicted by the filter and the value measured by the GPS before the filter position and velocity states are reset to the GPS. Making this value larger allows the filter to ignore larger GPS glitches but also means that non-GPS errors such as IMU and compass can create a larger error in position before the filter is forced back to the GPS position. The default value for the parameter is 25 meters. Thus, one would hit the threshold to reset about about 10 seconds (with speed being off by 2 meters per second).

This matches the data…where the snapback always seems to occur about 12-15 seconds after the divergence begins. So I think the answer to your question is that 2 meters per second is enough to create a problem.

Given my problem, I have reduced this value to 10 meters (which is the lowest value offered). I think this will reduce the length and magnitude of the divergence.

There is another parameter (EK2_VELNE_M_NSE) that tells the EKF to avoid trusting velocity information from the GPS as much. Unfortunately, the description says that this parameter is only used if the GPS does not provide a speed accuracy estimate. I’ve reduced it from .5 meters/sec to .1, but I doubt it will have any effect.

Naterater · December 1, 2018, 4:30pm

This all seems like it could be caused by a compass_orient parameter being set incorrectly. Does your plane always show an accurate heading when on the ground and connected to a GCS? The newest releases of plane have an automatic orientation detection parameter.

RogerR · December 1, 2018, 6:11pm

Nathan,

If you look at the 2nd graph, way up at the top of this thread you can see that compass readings are hardly changing at all until after the problem manifests itself. The EKFs quickly decide not to trust the compass since the heading doesn’t make sense…but that’s only because the EKF thinks it is somewhere that it is not.

The scale for compass innovations is on the right side of that chart. Once the value goes above 1, the EKF doesn’t use the compass for position determination. So the compass is very good (<.2) before the problem shows up and so bad (>1) that it’s never used after the problem shows up.

I’m pretty sure we’ve narrowed it down to bad (or stale) GPS velocity data being processed by the EKF. Either it’s being:

Calculated incorrectly by the GPS or
It’s delayed in transmission (in the GPS or in the PixHawk) or
It’s being processed incorrectly by the EKF.

I’m leaning towards one of the first two possibilities, based on hunt0r’s last set of graphs.

I can replace the GPS. However, this situation does point out that the EKF is unprotected against bad GPS velocity inputs when the GPS also provides error estimates for velocity that are also incorrect. Since the EKF programs the GPS and trusts its error estimates for velocity, there’s no parameter that can tell the EKF to be skeptical of the GPS velocity inputs.

RR

RogerR · December 1, 2018, 7:25pm

Perhaps I should add the new GPS as a 2nd one rather than replacing the original.This might provide the opportunity to get some smoking gun telemetry

RogerR · December 4, 2018, 4:52pm

Two new pieces of information.

I flew again and collected another set of data. There were no unacceptable GPS position innovations. However, there were two periods with velocity innovations above 1 during flight (and dozens above .5).
I found a posting at https://github.com/ArduPilot/ardupilot/issues/4450 where priseborough described the *_M_NSE and *_I_GATE parameters:

There are two parameter types. The *_M_NSE parameters set the minimum value of noise that the GPS fusion will be allowed to use regardless of what the receiver reports. This is there to protect against receivers that are overly optimistic. The *_I_GATE parameters set the number of percentile standard deviations allowed for the innovation before it fails consistency checks and the measurement is rejected. The size of the gate therefore adjusts to the reported accuracy of the GPS, however the minimum accuracy is always bounded by the relevant *_M_NSE parameter.

This is a bit different than what I expected for EK2_VELNE_M_NSE. The text in Mission Planner says that this parameter “sets a lower limit on the speed accuracy reported by the GPS”. From the priseborough description, I’d say rather that “limits the estimate of speed accuracy in setting GPS horizontal velocity observation noise”.

In any event, I should be able to change the value (currently at .1) to a higher value to make the EKF trust the GPS velocity measurements less.

Whether I’ll run that test is uncertain. I have the new GPS and I’d rather resolve the noise issue than adjust the software to ignore it. I suspect I’ll have a new harness built before I fly the next time.