Continuous uncommanded climb, suspecting EKF issues

Yes, there is a slight modification in ardupilot from my end, i am reading a file from SD card using AP_Filesystem in a separate custom IO thread, i don’t think anything going wrong with that piece of code cause i am running that same piece of code from last 2-3 months multiple times in every week on multiple quadcopters without any hiccups.

I have EKF2 disabled and EKF3 enabled, mainly cause when i used the master branch i.e. arducopter v4.1.0-dev, i found that support for EKF2 for 1MB flash boards is not there, so i thought it will be better to keep up with same setup like in master version.

Right now to save time, i probably going to retire this particular Omnibus F4 SD (V1) board, but if i can know if whether this problem can be solve by replacing STM32F405 MCU or MPU6000 or both on this board then it will be helpful too, also we have multiple more 5" quadcopters with same board running smootly for now without issues. And currently, we building in-house custom boards similar to this omnibus board but with better and different baros and IMUs.

I know bits and pieces about customizing ardupilot codebase so i can try to backport this PR (https://github.com/ArduPilot/ardupilot/pull/15092) on the stable firmware (arducopter-4.0.4) we are using.

1 Like

@Notorious7,

OK, sounds good. I think the backport for that PR would be a good idea actually. It’s already on the list to be backported for the official Copter-4.0 but I don’t know if/when we will do another point release before we start Copter-4.1 beta testing.

I could not resist to bring back my story from June 2018 when my Helicopter went straight up (over 700m) and I did not know what was going on.

@FRED_GOEDDERT

It’s really a very different issue but in any case, I have tried so hard to get people not to set the EK2_ALT_SOURCE to 1 (for rangefinder). It’s on the wiki, it’s in the parameter descriptions but people still set it (I found two separate users who set it over the weekend). Copter-4.0 and perhaps 4.1 have some safety improvements related to using rangefinder with the EKF but I have been very tempted to remove the feature completely.

I am aware of your information’s and totally agree with you.

1 Like

Arming check only or also during flight?

Some similarities here:



BARO.Alt and CTUN.Alt also very different (with CTUN.Dalt also negative BTW), and similar BARO.CRt/CTUN.CRt evolution.
There were vibrations, that increased suddenly with no apparent reason. Here there are no vibrations, but above graphs are similar.

Flight controller there was a chinese Pixhawk with no known problems. Hardware issue here?

It is a mandatory pre-arm check even if you set arming_checks to 0, but if somehow you get altitude estimates going haywire in air still you can take control of the drone using throttle, you can make your quadcopter ascent or descent according to your throttle input

I can confirm exactly same issue which happened to me by using Matek F405-std FC and some 4.0.0 or 4.0.1. Exactly the same issue like described, maybe with a bit more luck(because I was able to overtake a full manual control at about 10-20m above surface while aggressively descending (or was it falling down?)). So for me it costed only pair or 3d-printed legs :).

The scenario is pretty much the same - takeoff and fly trivial shape around home location. So, during takeoff it gone mad and aggressively climbed about 250m with a slight movement in a front direction and then started an aggressive descending with a slight movement to left side. Duration of this situation was about longest 30-50 seconds of my life. During this time, AltHold was immediately engaged as soon as first symptoms of misbehaving appeared(about 2-3rd second after start) - with no success. Next, Stab mode was engaged after about 10 seconds (reaction to AltHold was expected but it was still climbing) - again no results. Next Acro was engaged after next ~5 secs of waiting - no “visual” reaction (yes at that moment it was already barely visible, but signal strength of Control link was still OKay). Then it started to descent like already mentioned and at the very last moment I was able to overtake it in Acro(during descending tried to switch to Stab/AltHold with no success), but it was not full overtake, rather correction of the course of descending with no adequate reaction to throttle.

In my opinion there is no problem that IMU failed, shit happens. The biggest problem was as following:

  • FC does not respond really to control from radio(flight mode change)…
  • FC does not take into consideration aggressive altitude change base on Baro value

@Webillo,

I think this case you’re linking to was yet another issue. Vibration levels on the vehicle were over 120m/s/s which is just incredibly high. In this case the vibration failsafe triggered so the climb was much less than it would have been otherwise.

@silentjet,

Is there a post somewhere with an onboard log? With a log I think we probably can find the cause of the issue. There are almost no restrictions on entering stabilize mode so I suspect some other issue but let’s see.

Here is the full sequence of events, with the autotune and vibration compensation intervals:


See that during 11 seconds after autotune starts vibration levels were a bit high, but then (with no apparent reason (I was at 10m and visibly nothing happened)) increased a lot, also increasing CTUN.ThO, causing climbing.

However, what I mean is that the evolution there of BARO.CRt / CTUN.CRt are similar as in here, being also very different BARO.Alt and CTUN.Alt (with CTUN.Dalt also negative BTW), all this on a different hardware, so a hardware failure is not probable.

Unfortunately no, SD card was missing at that flight (was analyzing another GPS related issue the same time). Taking into account sequence of failures, most probably there was some issue with underpowering FC, but anyway it is a completely different situation.

I don’t see anything wrong with that accelerometer. It did correctly show the accelerations when the vehicle took off, accelerating upwards, also when it changed from climbing to falling.

Keep in mind that an accelerometer does NOT measure speed, nor position. ONLY acceleration, including G force. When a vehicle is moving at any speed in any direction (and that includes climbing), but at constant speed and in levelled attitude, the acccelerometer should show zero in X and Y, and -9.8m/s² in Z. Even if it’s shooting up like crazy, but at constant velocity, the accelerometer should still read -9.8m/s². And this one did. So the problem must lie elsewhere.

Even if it’s not my place, as a newcomer, I will attempt a general explanation of these unexplained climb-aways, that seem to be unnervingly common. It’s highly speculative, since I don’t know the detailed internal working of the software, so take it for what little it might be worth.
Let’s assume that we have a drone that suffers from high vibration at a frequency above the lowpass filters (propeller rotation frequency). Based on my own experience with my quad, it seems that the vibe measurement, including clipping testing, doesn’t properly report this. It seems to report only the vibration that falls below the lowpass filter cutoff. If that is true (please, can someone tell if it’s true or not?), then it could very easily happen that the accelerometer is clipping from the high frequency vibration, and since the Z axis normally is at -9.8m/s², it will clip on the negative side long before it does so on the positive side. Instead in X and Y, which are normally at zero, any moderate clipping will typically be symmetrical, and thus cause no big error in the low-pass-filtered values.
If Z does clip, then the negative peaks will be shaved off, while the positive ones will not. So after the lowpass filter we get a more positive value than correct. And a more positive Z acceleration value normally means that the vehicle is accelerating downwards, which would obviously make the EKF calculate an altitude that drops ever faster (and indeed the CTUN.ALT shows that behaviour, in this case!), and trigger a throttle increase that makes the vehicle climb away.

I think that it would be important to make sure that accelerometer clipping is checked BEFORE lowpass filtering, sample per sample. That includes any lowpass filtering done in hardware - and this could be a limitation with any specific accelerometers implementing some lowpass filtering on-chip.

Also I think that it would be very important for the software to NOT trust the accelerometer above everything else. I mean, in this case the baro was correctly saying that the vehicle was climbing away, the GPS was saying the same, the climb rate determined by baro and GPS agreed very well, but still the software decided to disregard all that and say that the vehicle was descending and needed more throttle, apparently only based on very slightly wrong accelerometer data caused by high-frequency vibration and undetected accelerometer clipping, integrated over several minutes! That’s crazy. The correct data was available, from two independent sources, and the flight control software didn’t use it.

2 Likes

Mallikarjun, in this case the baro was correct and the EKF calculated totally wrong data.
A slow baro drift of 10 to 20m during a flight is perfectly normal, due to temperature changes. And if you are flying in the wake of a building when there is strong wind, the turbulence can easily cause the baro to show fast variations of a few meters. Prop wash also does. So you can’t trust a baro for precision altitude measurement and control. But in this fly-away the vehicle climbed to over 600m! For that magnitude of signal, baros are highly reliable.

In the region of interest CTUN.Alt and CTUN.Dalt are close and negative, but there are reliable data from the barometer and GPS. This is hard to admit.

In the vibration levels change mentioned above after second 100 (without explanation, but also causing a sudden climb) suddenly they appear more less four times greater (x4):
VIBEsx4
Pixhawk accelerometers have scales up to ±2g, ±4g, ±8g, and ±16g, so imagine mistaking as instant scaling ±2g<>±8g or ±4g<>±16g.

I have implemented most of the arming checks now, backported this PR (https://github.com/ArduPilot/ardupilot/pull/15092) from master to stable version (arducopter 4.0.4) which i am using, still after all this, what are the chances of this issue arising during the flight ? Cause i doubt any failsafe (land, RTL etc) going to help me in that case, i don’t think copter will stop climbing

If there gonna be any slight chances of this issue getting reproduce during the flight then i will better place a flight termination timeout condition too depending upon difference between baro/gps altitudes vs ekf altitude, i also know that with this PR (https://github.com/ArduPilot/ardupilot/pull/15092) i can able to get control on quadcopter ascent and descent but in case where i probably fly multiple quadcopters together then this can make things messy

I fly several quadcopters and an hexacopter (possibly overpowered, but waiting full reliability for placing a camera). When above happened on the hexacopter, and it was not the first time, (never in quadcopters) I decided to wait to fly it again. I see that other people have had the same problem (unexpected climbs). I wonder if 4.1.0 will improve this, or may be I try PX4.

@Notorious7 With your flight mode parameter and rc_options parameters, how do you select stabilize mode? It looks like you were almost never in stabilize mode (with manual throttle control) during the uncommanded climb.

you are right, i was not able to stay in stabilize mode cause i panicked when i saw the quadcopter keep climbing up, i tried to switched in stabilize but i also keep giving land command at same time, since both land and stabilize was set on different RC switch, i got confused, but i really don’t intend to switch to stabilize again cause i have mulitple quadcopters connected to same RC transmitter at same time in air