V3.4.4 - Quad Flyaway - CTUN.DAlt is NaN

Hello!

I have a Quad that I’ve been flying for about a year. Flight controller is PixHawk. I updated to v3.4.4 about a month ago. I autotuned PITCH and ROLL successfully and made several flights since then.

Yesterday I had a flyaway. Luckily there was no harm other than to the Quad itself. I was able to recover it a couple of miles away thanks to the telemetry link that stayed alive until after the crash
.
Flight was as follows: I used my RC transmitter to take off in Stabilize mode and then moved to AltHold as I usually do once the copter reached a few meters altitude. I then switched to Loiter mode and attempted to correct the position of the copter (pitch/roll) that appeared to be drifting due to a light wind. I immediately noticed that the copter was not reacting to my pitch/roll inputs so I attempted an RTL. The copter didnt seem to react to the RTL command either so I moved to Land but still no reaction.

The copter kept climbing until I lost sight of it. Fortunately, the telemetry link was working and allowed me to follow the path that the copter was taking. I retried RTL from Mission Planner and also made a few tries to control the copter using Guided mode on the Mission Planner Satellite Map but still no luck. The copter kept climbing steadily (and drifting presumably in the direction of the winds).

Batteries started to run out about 10 min after take-off and the copter started to descend. It eventually crashed a couple of miles away. I managed to recover it in reasonable shape given the circumstances.

I’m looking for some help in understanding the reason for the failure. Attached is the (zipped) dataflash log with the hope that someone may be able to shed some light.

I’m a novice and never really looked at logs before. I did manage to find my RCIN Roll Input right after take off but didn’t see the expected effect on the ATT Desired Roll. Is there anything in the log that could help explain that?

I also noticed that CTUN.DAlt became “NaN” quite early in the flight. Would that explain why the copter kept climbing? What could be the reason for that?

I’m not sure whether this is relevant, but I did make 2 changes to the copter parameters right before the flight. One change was to the FENCE altitude (increased from 50m to 80m). I also changed the RTL speed to 1000.

I will appreciate any suggestions and/or pointers to information that may help me get to the root cause of the failure.

Thanks!!

2017-03-05 10-53-04.zip (4.0 MB)

Looks like you tried everything except Stabilize mode.

From the logs it looks like it went into Fence breach mode and was flying back to where it thought the fence was. Didn’t see a location for home so don’t know where it was flying too.

Mike

After further investigation the fence breach is caused by the altitude max being reached which was set at 80 meters. I guess it was just flying in the wind at 1700 meters.

Yesterday I had a flyaway. Luckily there was no harm other than to the Quad itself. I was able to recover it a couple of miles away thanks to the telemetry link that stayed alive until after the crash

Well, I’m glad you’ve got your vehicle back. Really bad fly-away,
'though. You might consider using the motor interlock or e-stop in
future.

Flight was as follows: I used my RC transmitter to take off in Stabilize mode and then moved to AltHold as I usually do once the copter reached a few meters altitude. I then switched to Loiter mode and attempted to correct the position of the copter (pitch/roll) that appeared to be drifting due to a light wind. I
immediately noticed that the copter was not reacting to my pitch/roll inputs so I attempted an RTL. The copter didnt seem to react to the RTL command either so I moved to Land but still no reaction.

The log shows you activating simple mode; did you mean to do that? It
seems to go insane as soon as that happens.

The copter kept climbing until I lost sight of it. Fortunately, the telemetry link was working and allowed me to follow the path that the copter was taking. I retried RTL from Mission Planner and also made a few tries to control the copter using Guided mode on the Mission Planner Satellite Map but still no luck.
The copter kept climbing steadily (and drifting presumably in the direction of the winds).

Yes, the log definitely shows you tried a lot of different things!

I’m looking for some help in understanding the reason for the failure. Attached is the (zipped) dataflash log with the hope that someone may be able to shed some light.

A telemetry log may also be useful.

I’m a novice and never really looked at logs before. I did manage to find my RCIN Roll Input right after take off but didn’t see the expected effect on the ATT Desired Roll. Is there anything in the log that could help explain that?

I also noticed that CTUN.DAlt became “NaN” quite early in the flight. Would that explain why the copter kept climbing? What could be the reason for that?

Yes, CTUN.DALt going away is definitely an issue.

Note CTUN.ThI going to maximum from that point onwards, too - despite the
RCIN.C3 being low.

Peter

Thank you Mike for looking into it!

You are absolutely right re Stabilize. I should have tried that to regain control (i.e. removing as much as possible of the failing autonomous behavior). In the heat of the moment I didn’t think of it.

Fence was indeed breached but that happened as a result of the flyaway climb. And if at all, fence should have triggered an RTL.

Thank you Peter.
Yes, Simple mode was intentional. I have both Stabilize and AltHold modes set with Simple Mode. It has been this way since I started flying this Quad so it doesn’t seem like a prime suspect to me.

My understanding is that CTUN.ThI goes to maximum because the software is probably interpreting the bogus DAlt as an “infinitely high” desired altitude. The big question is why is DAlt becoming NaN? Could it be a memory overrun?

In looking further at the log, I noticed that CTUN.DAlt becoming NaN happens right after the move to RTL mode. Presumably, when I selected RTL, DAlt should have been set to RTL_ALT that was configured to 20m (2000cm) as can be seen in the corresponding PARM entry in the log. Or is there any other FW processing that sets the DAlt upon entering RTL?
Could it be possible that somehow the RTL_ALT parameter had been messed in the Pixhawk memory? Could it have anything to do with the fact that I had modified the RTL_SPEED param to 1000 right before this flight?

In looking further at the log, I noticed that CTUN.DAlt becoming NaN happens
right after the move to RTL mode. Presumably, when I selected RTL, DAlt

Note that the NTUN message is the first to get NaNs in it. the CTUN
message is probably copying a value out of there.

I’m only poking this log every now and then, sorry.

This does look bad, 'though.

Thanks for spotting it Peter. NTUN Indeed shows NaNs in all of its Dxxxx fields right after entering LOITER Mode (before the NaNs in CTUN.DAlt that show up when I subsequently moved to RTL).

I assume the NTUN NaNs are in line with the lack of reaction to my pitch/roll stick movements as I was trying to correct the position while in LOITER (which is what motivated me to switch to RTL where the real climbing “fun” begun…).

So was this really a PixHawk memory corruption?

Thanks for spotting it Peter. NTUN Indeed shows NaNs in all of its Dxxxx
fields right after entering LOITER Mode (before the NaNs in CTUN.DAlt that
show up when I subsequently moved to RTL).

I assume the NTUN NaNs are in line with the lack of reaction to my
pitch/roll stick movements as I was trying to correct the position while in

Absolutely. The vehicle was attempting to loiter in a position valid only
in some alternate reality. Your pilot inputs were the only thing keeping
it vaguely sane.

Sadly, amongst all of the modes you tried, all of them required the
navigation controller! Stabilize and acro would have been OK. althold
may still have been screwy, but only in one dimension :slight_smile:

LOITER (which is what motivated me to switch to RTL where the real climbing
“fun” begun…).

So was this really a PixHawk memory corruption?

I hope not, and it would have to be fairly significant corruption to cross
all the variables in question. I think it more likely that you’ve found a
path through the code which doesn’t initialise the navigation variables
(when moving to loiter they should be initialised from the current
position).

I’ll run a valgrind autotest just to make sure it isn’t a blatant overrun,
'though :slight_smile:

Interesting…I had a similar issue but only have T-Log as I never found the quad. Figured something I did screwed it up and I didn’t have stabilize available - used the position for auto in what was supposed to be a “quick” 100 foot auto test for checking lidar accuracy. Learned my lesson about having a kill switch AND stabilize always available. I didn’t pursue the issue that caused fly away since I thought it was just my own stupidity. No matter what I tried (outside of stabilize) nothing effected the determination of the bird to climb at a heck of a rate and head to what it thought was home. Based on telemetry data, the bird flew 8 miles away and landed somewhere in the forest (thankfully). I don’t want to tell anyone how high this the bird flew - let’s just say, it just kept climbing uncontrollably. Tried everything (as darcopter). Thought I should chime in since the anomaly might not effect only me - though probably due to some parameter that I improperly set.

Thank you jamescooper1 for sharing your similar experience. Do you recall making any parameter change just prior to the flyaway? As I said in my original post, I did make a couple of changes myself prior to my flight.

I would very much like to know the root cause for the corruption of the variables that caused the flyway.

Since this could be such a dangerous situation, would it make sense to suggest an additional robustness mechanism in the autonomous modes that programatically verifies the consistency of all the “desired” variables (e.g. DAlt)? Just thinking off the top of my head, maybe some configurable threshold that limits the difference between the “desired position” and the “actual position” before acting on those values. It could be tied to a new failsafe check and trigger a LAND upon breaching the threshold.

Hi Dar, I have attached the parameter list used just before fly away.
Maybe you can see common fault? JIm

fly away parameters 2017-02-17 14-37-43.txt (10.1 KB)

I’m really really sorry about this.

The cause of the NaNs is the WPNAV_LOIT_SPEED parameter is set to zero and we don’t have protection for this in the waypoint navigation library. It’s not user error - we should have protections against this causing this kind of horrible flyaway.

FYI, this is the exact line which is causing the problem (divide by zero):
https://github.com/ArduPilot/ardupilot/blob/master/libraries/AC_WPNav/AC_WPNav.cpp#L196

We will fix this and push out AC3.4.6 as soon as possible.

Thanks very much for the report and for the analysis that others have done to make this easier for me to find the issue.

P.S. I’ve had at james cooper’s parameter file and it doesn’t have WPNAV_LOIT_SPEED parameter set to zero.

Here’s the fix that will be peer reviewed soon and released. https://github.com/ArduPilot/ardupilot/pull/5868

I’ve done a review of the attitude and position controllers and found one other similar missing check so I’ve added protection there as well.

Thank you Randy for looking into this and for the fix!
Do you know how is it that WPNAV_LOIT_SPEED got to be 0?

I looked again at my parameters and WPNAV_LOIT_SPEED is indeed set to 0.

What is the default value for this parameter? I’m pretty sure I never intended to change it, but maybe I inadvertently touched it when I changed the RTL speed just prior to the flyaway?

Or is it possible that there is some error in Mission Planner that caused WPNAV_LOIT_SPEED to be modified when I changed the other params?

Just a guess: do you use a German keyboard layout (or a similar locale setting with , as decimal separator)? I gave up on Mission planner after a never ending stream of such situations, e.g. with PID gains getting zero…

I actually use an english keyboard. I guess your point is about some unfriendliness in the Mission Planner interface. But my concern was with the possibility of an error that could cause the overwrite of a configuration parameter that was not intended to be changed. I’m not claiming that to be the case. This could have been an inadvertent user error on my side. Just asking in case someone else experienced anything similar.

Randy, thanks for taking a look at this. I don’t feel quite so bad and did learn a valuable lesson - no flight should be taken for granted! Will be putting another test together this week though I see some issues exist relative to LidarLite 3.3, This bird is to be used in both auto and terrain follow modes, anyone suggest a better choice of hardware? Thanks.

I have had problems in the past with using MP’s parameter editor with using my mouse’s scroll wheel. If I inadvertently gave a field focus, the scroll wheel will change the value for the focused field rather than scroll the list of parameters. I’m careful now to ensure that I don’t click on a field before I use the scroll wheel.