Continuous uncommanded climb, suspecting EKF issues

Firmware - ArduCopter-4.0.4
Hardware - Omnibusf4pro
GPS - Here+ RTK (but flying in 3D fix without RTK setup)
Build - 5" props quadcopter

Issue:
Really could had been a disaster day, was lucky to retrieved the quadcopter. Given command to takeoff to 4 meters in guided mode, and then it started takeoff and keep climbing straight upwards, even given land command multiple times from RC6, even tried switching to stabilize mode from RC5 still quadcopter didn’t seem to respond to any of those commands. And within a minute after takeoff, quadcopter went out of sight straight upwards in the sky.

After few minutes, we found it crashed nearby on the ground, with gps mount broken apart from the quad, but we really didn’t saw quadcopter coming down, we found it about 100 meters away after some minutes, when we retrieved the quadcopter, 3 of the 4 motors was still armed, and after picking the quad and tilting it in all angles it was not disarming. After retrieving the log, only thing which i found was not normal was EKF altitude estimation throughout the flight, baro was showing right readings, and EKF was using baro as altitude source, i am not able to figure out any other significant causes of this whole disaster flight. From the logs, it seems like quadcopter didn’t actually come crashing to the ground, it seems to land but we not able to see that, overall just gps mount got damaged and rest of the parts are completely intact and in working state, no props got damanged as well. In the air, quadcopter even traced some waypoints in guide mode which it was supposed to trace at 5 meters relative altitude.

From the logs, i found it took flight controller quite a while to trigger any kind of failsafes, RTL command triggered too late, and i am not even sure what was actually happening just before quadcopter started to come down, please help in finding out what actually went wrong to avoid this catastrophic situation.

Log:
https://drive.google.com/file/d/1MprOn5FzknTWj9DZGGoyJrPcvWfG7wT9/view?usp=sharing

CTUN.DAlt and CTUN.Alt:

No idea.

When it starts to climb you seem to have a normal behaviour in panic (move throttle, but keep it at a minimum when nothing happens), but it does nothing.


May be a safety behaviour could be implemented for these panic situations:
·for automatic modes to start (Loiter, Auto…) throttle (at least) must be centered (with tolerance);
·in panic, pull throttle low, and this overrides mode, changing to a controlled mode, such as Stabilize.

I doubt if arming in Auto starts mission, so a delay may be needed.

Yeah, we did panic somewhat to get back the control of quadcopter, but we were failed to do so, this quad was flying really well for quite some weeks and this EKF misbehavior occurred in this flight, i want to know if there’s any way via some parameters or enabling any other failsafe to avoid this scenario again or is this kind of mishap bound to be happen everytime if EKF estimation goes AWOL again like in this particular flight ?

I cannot think of or find much what i can do from my end to prepare for these scenarios during the flight, one option i know is flight termination which will definitely work but dropping quads from large heights definitely gonna cost too much and really like to use it as last option

Another thing is again i like to know what causes this large variance in XKF4.SH and innovation in XKF3.IPD ? Any other sensor fault ? Baro seems to be working well, then don’t know what went wrong with EKF altitude estimates

Why is ARMING_CHECK=2 ???

1 Like

Yeah i know that is missed from my end, we were still testing some stuffs, so haven’t put any arming checks, i like to know which arming check would have helped in this scenario

Your Accelerometer is toast. It did not show any changes (or minimal changes) during flight. No surprise EKF went haywire. While copter ascended, Acc.Z remained 9.8ms2 which means to ekf that it does not move up/down at all.

1 Like

Thank you, it will really help a lot, does it means whether IMU on this omnibusf4pro board is permanently damaged & need to be replace or some external factor was playing any part on the IMU during flight ?

It seems permanently damaged, You can try a full calibration, but it if failed once, there is a chance that it will do it again.

Yes, IMU replacement can be done quickly from my end, we will still try to fly it using some long thread to ensure if this misbehavior repeat or not, but if this issue does not repeat after accel recalibration & taking many flights then will it be safe to fly again without IMU replacement ?

I guess it will be best to replace IMU directly, like @kd0aij suggested, it will be also helpful to know what arming checks can help me in situation like this where EKF can go haywire again ? Will arming checks can really avoid these scenarios cause there doesn’t seem any kind of warning message during and after the takeoff in the log and even it took really a while for flight controller to trigger RTL

Perhaps INS check would have caught something.

Thanks, then i will first try to repeat this misbehavior in this particular quad using some thread or flight termination enabled on some RC switch, if i can able to reproduce the issue on same quad then will confirm both IMU and arming checks reliability

the Z accelerometer looks OK to me. According to the baro and gps, the entire flight occurred at 5m AGL, and crash was detected and disarmed at 6:28.5
Vibration levels look OK also.

I doubt if you are watching the first flight which went alright except the landing part, yup it seems like you mentioning first flight, everything was almost alright in first flight, please have a look of the next flight which was after 30 minutes of this flight

Naah don’t think so just by reconnecting the battery everything started to work normally if it’s been a hardware damage it never able to make it back isn’t it?

@devs just wanted to know how ekf calculate altitude estimations? I am just wondering if baro is reporting correctly why ekf altitude estimations can’t even able to get close enough to it? seems like it didn’t even cared about baro values

but now having a look of the first flight which went well, i am not able to make much difference between ACC.Z log of first flight and second flight where the issue happened, it seems to be same in both occasions, it might be other thing that accelerometer probably be damaged from the start, but it traced the first flight well as it supposed to

Several:

  • your long thread, with a flexible coupling;
  • ARMING_CHECK;
  • vibration compensation (seems not to be your case);
  • FS_GCS_ENABLE, and disconnect (pull) the GCS communication;
  • emergency stop on some channel or switch (I think it always stops motors like a
    emergency_stop
    so better if the copter is low)

The safety behavior with throttle I mentioned before, or something similar if anything fails, would have to be implemented in code; for example, for Autotune to start, if I am not wrong, throttle must be in the middle position. I myself have had several not fully clear sudden climbs.

Things may fail. An uncontrolled drone may kill. Pilot should be able to recover control immediately he wants.

1 Like

I also suspect a hardware issue because the EKF’s climb rate and altitude are not even close to the barometer’s. Below is a graph of the barometer altitude (in red) and the EKF’s (in green)

Here is the baro climb rate (in red) and EKF’s climb rate (in green).


So this looks like an accelerometer bias or scaling issue to me.

I think there have been several reported issues with one of the omnibus boards. Perhaps we should add warning to the wiki to steer people away from this board.

In Copter-4.1 we’ve added an addition altitude estimate arming check (see PR here) that (if arming checks had been enabled) would have caught this problem. We have not backported this to Copter-4.0 though but we probably could.

As a side note it looks like this vehicle is not using the standard ArduPilot code and in Copter-4.0 the EKF2 is the default. Still, I don’t have any reason to believe that these contributed to the issue.

1 Like

I have had similar issues on BMP280 baros on Kakute F7 too. Not this extent, but 10 to 20m error build up over time.