Skipping invalid cmd; No mission; AUTO mode enabled -> Possible crash

I just had issue which could cause a crash quite likely. The issue was found when the AUTO mode was accidentally switched on. Drone started to descend very fast and thus I quickly took over with manual control. This happened after re-flashed 3.6-dev and without any mission uploaded.

The error on Logs says “skipping invalid cmd #0” and its repeated about every 20ms until the AUTO mode was changed off. Its going through the commands “forever” until that and the desired height is set to “0”. Because of the desired height it starts to descend at maximum speed and that causes high risk of crash.

In my opinion it should it stay at hover at current location/height when its skipping multiple commands in a row like in this case
OR
not allow the AUTO mode to be enabled in first place
OR
Cause a failsafe

Log from the flight:
Log file

pbarker@bluebottle:/tmp$ mavlogdump.py --t CMD Skipping\ invalid\
commands\ flight.bin
2018-04-20 01:53:39.64: CMD {TimeUS : 59778629, CTot : 40, CNum : 0, CId :
16, Prm1 : 0.0, Prm2 : 0.0, Prm3 : 0.0, Prm4 : 0.0, Lat : 62.1057815552,
Lng : 25.6512145996, Alt : 90.0, Frame : 0}
2018-04-20 01:53:39.64: CMD {TimeUS : 59778727, CTot : 40, CNum : 1, CId :
0, Prm1 : 0.0, Prm2 : 0.0, Prm3 : 0.0, Prm4 : 0.0, Lat : 0.0, Lng : 0.0,
Alt : 0.0, Frame : 0}
2018-04-20 01:53:39.64: CMD {TimeUS : 59778751, CTot : 40, CNum : 2, CId :
0, Prm1 : 0.0, Prm2 : 0.0, Prm3 : 0.0, Prm4 : 0.0, Lat : 0.0, Lng : 0.0,
Alt : 0.0, Frame : 0}
2018-04-20 01:53:39.64: CMD {TimeUS : 59778850, CTot : 40, CNum : 3, CId :
0, Prm1 : 0.0, Prm2 : 0.0, Prm3 : 0.0, Prm4 : 0.0, Lat : 0.0, Lng : 0.0,
Alt : 0.0, Frame : 0}
2018-04-20 01:53:39.64: CMD {TimeUS : 59778875, CTot : 40, CNum : 4, CId :
0, Prm1 : 0.0, Prm2 : 0.0, Prm3 : 0.0, Prm4 : 0.0, Lat : 0.0, Lng : 0.0,
Alt : 0.0, Frame : 0}
2018-04-20 01:53:39.64: CMD {TimeUS : 59778899, CTot : 40, CNum : 5, CId :
0, Prm1 : 0.0, Prm2 : 0.0, Prm3 : 0.0, Prm4 : 0.0, Lat : 0.0, Lng : 0.0,
Alt : 0.0, Frame : 0}

You had 40 waypoints loaded. The entry check is that you have more than
one (we use wp0 for “home”).

It seems so that there was 40 waypoints loaded. However I had fresh install (previously PX4 stack) and then switched to ArduCopter and I’m 100% sure that I did not upload any missions/waypoints.

So the question is how this happened and what should be done to avoid this?

I would like to find out the reason this happened as it can cause a crash. Any ideas what to look for?

It seems so that there was 40 waypoints loaded. However I had fresh
install (previously PX4 stack) and then switched to ArduCopter and I¢m
100% sure that I did not upload any missions/waypoints.

So the question is how this happened and what should be done to avoid
this?

Ah, new information, thanks!

This is actually rather curious. We store the number of waypoints as a
parameter. That parameter should have been zeroed when ArduPilot realised
that the parameter-version-parameter wasn’t equal to the code’s
parameter-version-value. So that’s one mystery.

I would like to find out the reason this happened as it can cause a crash.
Any ideas what to look for?

The bug will be in AP_Mission. The verify callback is returning “true”
after emitting that “invalid cmd” message, indicating that we’ve completed
that “invalid waypoint”. The Mission library should then move us onto the
next waypoint - also invalid. Interestingly, I think this scheme means
that we could take 4 seconds to go through those 40 waypoints - even if it
was working correctly, which it isn’t since you’re always stuck on the one
waypoint.

Are you interested in chasing this down from a code perspective?

I think the first step is to recreate this in SITL - do you know how to do
that?

Thank you for your interest and effort.

On this 4 seconds time window it would be descending with maximum allowed speed, which on my case is about 4 m/s. This would cause a crash heights below 16 meters AGL. It could be even worse if the descending velocity is higher.

Why does it allow a waypoint with lat: 0.0 and long 0.0 and height 0.0. Shouldn’t this be 99.99% cases a error and thus AUTO mode would be stopped?

Other “failsafe” could be holding the position/height while having a invalid commands.

Well I would like to try to chase this problem, but I’m fairly new to ArduCopter. I’m not professional programmer, so my code reading skills might not be enough to spot hard to find bugs.

I haven’t used SITL before, but have read a bit about it and I would like to learn more. But I would need some rough guidance what to look and I will try to solve how to get there. Might take some time tough.

Reason for the interest to find out bugs like this is that, I’m might going to use ArduCopter professionally on drones and I would like to get this 3.6 stable and bug “free” with the new loiter.

Well I would like to try to chase this problem, but I¢m fairly new to
ArduCopter. I¢m not professional programmer, so my code reading skills might
not be enough to spot hard to find bugs.

In SITL you can step through the code and inspect variables etc etc, so
it’s not a “stare at code until the bug pops out at you” thing.

I haven¢t used SITL before, but have read a bit about it and I would like to
learn more. But I would need some rough guidance what to look and I will try
to solve how to get there. Might take some time tough.

We’ve got lots of docs on the Wiki. I honestly don’t know which is going
to be the best option for you - I’m partial to the Vagrant VMs myself (if
you’re not using Linux already :wink: )

In general we interpret a lat/lon/alt of 0/0/0 as the current location so that shouldn’t cause a descent or any crazy behaviour.

My off-the-cuff guess is that the fast descent is caused by a waypoint having an absolute altitude specified.

Anyway, I’ve added this to our 3.6 issues list so it won’t be forgotten.