Yaw (and plane) out of control in auto modes (ATT.Yaw)

Yes, lots of sensor information is fed into the EKF. Someone who is more experienced than me in looking at the EKF could probably spot the problem fast.

A small, but important point: I observe the EKF Lat/Lng to slowly diverge from the GPS, and then to “jump” back in agreement. (Not to “jump away” to an incorrect value.)

You’re probably correct on that, but It diverges fast enough to aggravate me :slight_smile:

This happened in my first night flight ever…about 1/4 mile away. So I was extremely happy to be get it back in FBW-A to a successful landing. Especially when I also got an RC failsafe and the lights went out for a couple of seconds.

RR

Judging from the time scale on your graphs, it looks like it diverges for a period of about 5 -10 seconds and then comes back to reality in a second or less. I still have no idea.

Since the accelerometers and the flight controller are both on the PixHawk, replacing the PixHawk might help. But I’d still love to hear from someone who knows how the EKF could do this.

RR

I’ve gone back and looked at the IMU data and don’t see anything anomalous before the first upset (or after). So, no sudden changes in either compass xyz, GPS lat/long/alt or any axis of the IMU gyros or accelerometers.

There are a couple of spikes in the GPS parameter:GPS.GCrs R just before the time of the first upset. I’m uncertain of the parameter definition or significance. It’s a weak correlation since the upsets continued for a few more minutes and I don’t see any other matching spikes in the GPS.GCrs parameter.

But if the GPS.GCrs parameter is the GPS course and if the EKF uses that GPS message for navigation, I could see it creating some confusion. The following plot shows the period of the first upset (where the red line spikes up). The blue and yellow lines are x and y compass readings. The green line is the GPS.GCrs.

The compass lines are stable prior to the upset. But the green line (GPS.GCrs) changes from 185 to 320 just prior to the upset.

Does the EKF for Arduplane use the GPS heading? Does it need the GPS heading…or could I reprogram my GPS to stop sending the messages with the GPS course (GxVTG, GxRMC)?

RR

I’ve buckled down to learn about the EKF data and attempted a new analysis. The tutorial for the EKF at http://ardupilot.org/dev/docs/extended-kalman-filter.html#extended-kalman-filter is very helpful.

EKF3 contains all of the “innovations” for the model. Innovations are the differences between the model and the subsequent sensor data. My first graph plots the EKF3 values for IVN and IVE (GPS velocity vs. model velocity). This looks almost like a square wave with a 12-15 second period during the “upsets” in navigation.

The EKF tutorial notes that “These are an important measure of health for the navigation filter. If you have god quality IMU and GPS data they will be small and around zero” I read this as saying that only the IMU and GPS data are involved here.

My 2nd graph shows IPN and IPE added (GPS position North and East vs model position):


At the beginning of each 12-15 second period, the positions match. But they drift away for the duration of the 12-15 seconds…and then snap back to match. The drift direction matches the direction of the excessive velocity.

Could the compass still be involved. It’s doubtful As shown in the following graph (where I overlay IMX and IMY), there is a gigantic innovation for the magnetic sensors at the beginning of this sequence. But it’s so big that the EKF stops using it for navigation…therefore it can’t be a cause of the subsequent errors:

The next graph, from NFK4 (EKF4) shows SV and SP. These are estimates of GPS velocity and position errors respectively.

Notice that the velocity error occurs and then the position drifts. I conclude that the position drifts off because the velocity was wrong. In other words, the position error is an effect of the velocity error…not the root cause.

If I look at the GPS.GCrs value (and treat it as the GPS course) for the first 10 seconds of this problem, I see impossible values (shown here in red).

The course changes from due south to almost north (330 degrees) in less than 1/4 second. 4 seconds later, it goes from east to north in 1/4 second. It’s a big plane limited to 45 degree bank angle turns, so moves of that magnitude aren’t possible.

Both the GPS and IMU (as part of the PixHawk) share the characteristic of devices that have worked well w/o these errors over the last year. But I can replace the GPS far more easily and it’s less money. So, I’ll try replacing it.

RR

I’m just now getting a chance to read your posts. Great work! I’m not sure if any of this is helpful, but just in case…

Yes, it is. It is the “float ground_course;” member of the AP_GPS class (Line 132 of AP_GPS.h)

Interesting note: It’s calculation may depend on what GPS unit you have. I see that some GPS use atan2(vel.y, vel.x) to calculate from velocity, while others read the information directly from the GPS. (I am not surprised that some GPS units may calculate this internally, and ArduPilot just uses it as-is.)

@RogerR Out of curiosity, what GPS unit are you using? (I don’t see if you’ve ever told us)

I’d like to know the answer to these questions, too. Does anyone know?

I know how EKF’s work, and I think your conclusion of “only IMU and GPS data are involved here” is not correct. (The tutorial comment isn’t meant to be read so precisely… it’s just giving the ‘gist’ of the idea.)

Yes! This corresponds to the behavior I observed above as well with my plots titled “EKF drifts away from GPS and ‘jumps back’”.

I agree. Good conclusion.

Based on your excellent analysis, I’m also persuaded to check if the GPS is doing something suspicious. I’ll take a look and post what I find.

Are any ArduPilot EKF experts following this post? Do you see something we’re missing? I’m not even sure who to tag. @priseborough @WickedShell @tridge

I found an idea which may or may not be relevant: The GPS is (for some reason) reporting Speed and Course which are slightly different than its own reports of Lat/Lng position in time.

Here’s my details:

  1. I took GPS.Lat and GPS.Lng and converted them to positions in meters N and meters E (from the first GPS reading) via a spherical earth model with radius 6.3781*10^6 m. (I’m pretty sure this is the model used inside ArduPilot, too. I’ll verify with a code reference if anyone cares.)

  2. I did a first-order speed and course calculation on this gps_in_NE.

[Details: If (T1,N1,E1) are the first (time, posN, posE) triplet and (T2,N2,E2) are the second, form dT=T2-T1, dN=N2-N1, dE=E2-E1. Then my_speed = sqrt((dN/dT)^2+(dE/dT)^2) and my_course=wrapto360(90-atan2d(dN/dT, dE/dT)).]

  1. Take a look at plots of GPS.Spd vs my_speed, and GPS.GCrs vs my_course:

You don’t have the benefit of zooming in, but my calculation disagrees with the GPS often by 2[m/s] or more for sustained periods (~10sec) of time. I’m surprised by this. The direction is similarly off by say… 20[deg] sometimes. (Also, I did a separate calculation including altitude-change in the speed calc, making it 3D speed. It did not change the discrepancy so I’m not showing it here.)

I still don’t know if this effect might be enough to disrupt the EKF’s estimate so badly. Could the EKF be confused by receiving slightly inconsistent Speed/Course info from a GPS?

@RogerR A fundamental question… how securely is your GPS attached? If it could “wobble” during flight, and it has built-in IMU’s and Compasses, that might be a problem. But that’s just a wild idea.

Hunt0r, the GPS is (supposed to be) a UBOX NEO-M8N. I say “supposed to be” since I ordered it from Banggood and I understand that could be some clones from that part of the world. It works pretty well, I routinely get 9-12 satellites in my house and it gets a fix more quickly than any other GPS that I’ve used.

Since the GPS has no other source of data beyond past GPS fixes, it must be calculating the course from the set of previous positions. Arduplane software could do the same calculation itself (from the last few positions) rather than getting the derivation from the GPS. Whether it’s UBOX or ardupilot software, intermittent errors seem odd.

I ordered another GPS unit last night.

Since I’m running EKF2, I should have looked at the tutorial about it at http://ardupilot.org/dev/docs/ekf2-estimation-system.html. One immediately useful data point from this is that I can see the SV and SP (velocity and position error estimates) for both IMUs. And the graph for NKF9 SV and SP matches the last one posted above for NKF4. So, unless the gyros for both IMUs went crazy at once, there’s yet another finger pointing at bad velocity data from the GPS.

Another thing I noticed that’s clear in the graphs you provided on the position variance is that the magnitude of the EKF velocity during each period of divergence closely matches the magnitude of the GPS velocity. It’s only the direction of that divergence that varies between the two. And in most (if not all) of the cases, the EKF model’s error is that it continues to track the previous course too long.

Perhaps the GPS is providing stale GPS.GCrs updates…or ardupilot is processing the updates a bit too late?

RR

GPS is stuck to plane with double sided sticky stuff…and it’s about 2" in diameter, so that’s plenty to stick it down really well. Zero chance of wobble independent of plane.

I see evidence in your latest graphs that GPS speed is more of a contributor to the problem than course.

There’s a parameter (EK2_GLITCH_RAD) that can be set in plane to control the maximum radial uncertainty in position between the value predicted by the filter and the value measured by the GPS before the filter position and velocity states are reset to the GPS. Making this value larger allows the filter to ignore larger GPS glitches but also means that non-GPS errors such as IMU and compass can create a larger error in position before the filter is forced back to the GPS position. The default value for the parameter is 25 meters. Thus, one would hit the threshold to reset about about 10 seconds (with speed being off by 2 meters per second).

This matches the data…where the snapback always seems to occur about 12-15 seconds after the divergence begins. So I think the answer to your question is that 2 meters per second is enough to create a problem.

Given my problem, I have reduced this value to 10 meters (which is the lowest value offered). I think this will reduce the length and magnitude of the divergence.

There is another parameter (EK2_VELNE_M_NSE) that tells the EKF to avoid trusting velocity information from the GPS as much. Unfortunately, the description says that this parameter is only used if the GPS does not provide a speed accuracy estimate. I’ve reduced it from .5 meters/sec to .1, but I doubt it will have any effect.

This all seems like it could be caused by a compass_orient parameter being set incorrectly. Does your plane always show an accurate heading when on the ground and connected to a GCS? The newest releases of plane have an automatic orientation detection parameter.

Nathan,

If you look at the 2nd graph, way up at the top of this thread you can see that compass readings are hardly changing at all until after the problem manifests itself. The EKFs quickly decide not to trust the compass since the heading doesn’t make sense…but that’s only because the EKF thinks it is somewhere that it is not.

The scale for compass innovations is on the right side of that chart. Once the value goes above 1, the EKF doesn’t use the compass for position determination. So the compass is very good (<.2) before the problem shows up and so bad (>1) that it’s never used after the problem shows up.

I’m pretty sure we’ve narrowed it down to bad (or stale) GPS velocity data being processed by the EKF. Either it’s being:

  1. Calculated incorrectly by the GPS or

  2. It’s delayed in transmission (in the GPS or in the PixHawk) or

  3. It’s being processed incorrectly by the EKF.

I’m leaning towards one of the first two possibilities, based on hunt0r’s last set of graphs.

I can replace the GPS. However, this situation does point out that the EKF is unprotected against bad GPS velocity inputs when the GPS also provides error estimates for velocity that are also incorrect. Since the EKF programs the GPS and trusts its error estimates for velocity, there’s no parameter that can tell the EKF to be skeptical of the GPS velocity inputs.

RR

Perhaps I should add the new GPS as a 2nd one rather than replacing the original.This might provide the opportunity to get some smoking gun telemetry :slight_smile:

Two new pieces of information.

  1. I flew again and collected another set of data. There were no unacceptable GPS position innovations. However, there were two periods with velocity innovations above 1 during flight (and dozens above .5).

  2. I found a posting at https://github.com/ArduPilot/ardupilot/issues/4450 where priseborough described the *_M_NSE and *_I_GATE parameters:

There are two parameter types. The *_M_NSE parameters set the minimum value of noise that the GPS fusion will be allowed to use regardless of what the receiver reports. This is there to protect against receivers that are overly optimistic. The *_I_GATE parameters set the number of percentile standard deviations allowed for the innovation before it fails consistency checks and the measurement is rejected. The size of the gate therefore adjusts to the reported accuracy of the GPS, however the minimum accuracy is always bounded by the relevant *_M_NSE parameter.

This is a bit different than what I expected for EK2_VELNE_M_NSE. The text in Mission Planner says that this parameter “sets a lower limit on the speed accuracy reported by the GPS”. From the priseborough description, I’d say rather that “limits the estimate of speed accuracy in setting GPS horizontal velocity observation noise”.

In any event, I should be able to change the value (currently at .1) to a higher value to make the EKF trust the GPS velocity measurements less.

Whether I’ll run that test is uncertain. I have the new GPS and I’d rather resolve the noise issue than adjust the software to ignore it. I suspect I’ll have a new harness built before I fly the next time.

1 Like

Do you have a .TLOG and .BIN log of the flight that I might be able to view?

I just uploaded them to https://drive.google.com/open?id=0B_IdJRw2njBET1RFYXl5eGduZEU

I shared the log directory, the two files you asked for are the last two uploaded with a 12/2/2018 date.

Sorry about being slow to get them up here. I’m still uncertain what was going on on this flight. And I’m now ready to try again with a new GPS. But, the weather hasn’t cooperated yet.

RR

About the last flight:
I went back and compared some key EKF parameters and realized that I had changed two parameters prior to flying.

  1. EK2_VELNE_M_NSE was changed from .5 to .1

  2. EK2_GLITCH_RAD was changed from 25 to 10

I don’t believe EK2_GLITCH_RAD came into play during the flight. Position never drifted far enough way to cause a reset.

EK2_VELNE_M_NSE (if I understand it) does impact the data. When NKF4 SV is plotted, I believe that it is comparing inconstancy in the GPS velocity to the parameter. SInce I shrank the parameter by a factor of 5, the EKF plot is scaled up. If I were to divide the plotted values in the graph below by “5”, there wouldn’t be any that would be considered a problem (they all would be <.4).

If I have analyzed correctly, this flight didn’t show the problem and was OK. I had airspeed sensor problems that resulted in flying too slowly. This convinced me earlier that the plane was erratic, but it was probably doing a stall and correcting (a big problem, but not one for this thread).

So, the problem appeared to vanish for this flight. However, I flew yesterday and will add another post about the problem.

On to the new flight. On this one, I think I may have really solved my airspeed sensor issue (seems to be electromagnetic interference with the sensor and not with the I2C cable). I also flew with a new GPS unit (although it’s still a ublox M8N).

In addition to the two parameters mentioned above, I also changed the EK2_VEL_I_GATE from 500 to 300. Here’s the SV chart:


Although there are 4 or 5 places that look bad, only one occurrence (about 2/3rds of the way into the flight) exceeds a value of 5. Since I have the EK2_VELNE_M_NSE parameter set at 1/5th the default value…I believe that’s the only occurrence that would cause the EKF to cease using the data from the GPS if I reverted to the default parameter settings.

But one occurrence is one too many and leads me to believe that there’s still a problem lurking. I’ve now changed the GPS and the primary compass. Any idea of where to look next?

Log is at https://drive.google.com/open?id=1MzIJzecMt9R8xj5FcLAAwsK6ISX6XPn_

RR

I generally don’t see anything out-of-line in your logs, although I would improve your prop balancing or IMU Isolation. At higher throttle settings, I think you are having some aliasing that is causing those SV values to rise. Yours seem to stay average around 0.5 until later in the flight when your average throttle increased.

Those SV values are probably increasing because of the Z-acceleration variance (IMU.AccZ)

I compared the SV values to my most recent flight, and my values for a large, super-stable twin-engine plane peak around 0.2

My z-acceleration does not vary nearly as much as yours. Here is yours:

Here is mine:

At this point, only looking at data is difficult. A picture of your plane and autopilot attachment would really help me diagnose any problems that you are having. It does look like your plane is following its mission well.

Here’s a picture of the plane and one of the flight controller bay.
Clouds%20in%20air

The PixHawk is soft mounted with one of these:

It’s possible that a small amount of vibration might be coupled in from two sources. First, I had a telemetry cable routed through with quite a bit of heat shrink on it. I’ve trimmed it back to minimize the possibility. Second, the front of the controller just barely touches the bulkhead in front of it (it appears in contact, but there’s enough room that a piece of paper will slide between the two w/o binding).

If I look at the Vibrations 3.3 plot from the logs, the max values are still well within spec. The documentation says you need to be below 15m/s/s with occasional higher peaks. My max for the Z axis is only a tiny bit over 3.

.

While clearly present, I tend to think that the higher level of vibration coincident with higher throttle is OK. If you look at the following graph, there are some large SV peaks (yellow line) with the lower throttle (red line) and some (including the biggest) are with the higher throttle.