Severe problems with AC 4.0.3 - Drifting, FlowHold, logging,

@rmackay9
Since my upgrade to AC 4.0.x I am facing an increased number of issues. Today I almost crashed. The copter flies great and rock stable in Loiter for a few minutes after powering up, but starts heavy drifting in all axes after a few minutes which becomes increasingly worse and eventually the copter becomes uncontrollable. Today, even after I managed to land, the motors would not stop spinning. After a few seconds on the ground the motors suddenly rev’ed up and the copter took off again and (fortunately) fliped.

Next problem: in FlowHold, the copter starts erratically dancing around. With the exact same OF configuration in AC3.6, at the same location and light conditions, FlowHold was working smooth and stable.
See video here: https://youtu.be/hLpFwkCxzxM

Next problem: after enabling the Harmonic Noth filtering, there is no more logging of a number of parameters anymore, such as all vibe or Optical Flow values. In flight, all values are correctly shown in MP, however apparently no longer recorded in the log files.
Here is the logfile: https://1drv.ms/u/s!AnKeW8KMoCcyyBScOxD3jTFQ_8Ma?e=z8nXjd

Next problem: heavy baro drift, even with setting TCAL_ENABLED = 2 the drift could be reduced, but still not eliminated. See also my previous report here: AC 4.0.2: false logging or broken sensors?

What FC do you have? Your long loops spike when you are airborne and your load is at 70% spiking at 80% which is incredibly high. My guess is that you flight controller is overloaded and @tridge 's loop slowdown is kicking in. My guess is that this is also why you are losing logging. I think this might also explain the discrepancy with 3.6.

Thanks for getting back Andy.
It’s a Cube Green which is equal to the Pixhawk 2.1 Cube Black with the only difference that PWM out is shifted to 5V.
MP recognizes it as a “Cube Black” during FW install.
I have a truckload of accessories connected, including a:

  • A Lidarlite connected via PWM
  • ADS-B receiver on Serial 4
  • a TX2 companion computer with APSync on Serial 2
  • A Mavlink to Jeti Telemetry connected to Serial 1
  • A PX4Flow on I2C
  • A IRLock on I2C for precision landing

could that cause this overload? @tridge @rmackay9
Again, zero issues with the same configuration on AC 3.6.11 NuttX.

So Cube Green should have no problems, but I could imagine a bug in one of the drivers for the peripherals is causing an issue. Can you disconnect the peripherals one by one and see how it affect the load (PM log entries)? Also there is a lot of oscillation on roll and pitch while you are airborne - is that expected? It matches desired roll/pitch but wondered what was going on at this point.

Update: you know it’s weird, I have just checked a bunch of the logs that people have posted over the last month and they vary tremendously in long loops and load - one is 6 long loops and 30% another is 600 and 70% like yours - and the 30% was running my FFT code. I just wonder if there is some particular combination in AC 4.0.x that is sucking up cpu. 30% and <10 long loops is what I would expect FWIW. There is a debug option (SCHED_DEBUG=2) for seeing loop slippage, but you need to be running mavproxy to see the output so that may not be easy.

@mtbsteve, @andyp1per,

Re CPU performance, it’s a random guess but I would check the EK2_IMU_MASK parameter to see how many IMUs and EKF cores are enabled. I think this has the largest impact of any parameter upon the CPU load.

The scheduler slow-down feature helps prevent catastrophic failures by ensuring that everything gets run but it could lead to more subtle problems as we run out of CPU. I wonder if perhaps making the user aware they are running out of CPU might be helpful.

Anyway, a bit like @andyp1per is saying, I think we need to separate the problems and investigate them individually.

1 Like

You probably don’t need to disconnect the peripherals - just disable them in your config

What would be entailed in making this a mavlink message with percentage load being transmitted back to the GCS for monitoring/graphing in real-time?

I think this could be huge, especially for more exotic setups, or for pushing the limits of filtering.

You can do this now. Load can be seen on the status screen, added to the Quick Display or seen on the Live Tuning Screen in Mission planner.

1 Like

Available in QGC/Solex as well as a graph?

In QGC you can see it with MavLink Inspector and graph it under SYS_STATUS. Update is slow…

1 Like

@rmackay9 @andyp1per
Thanks for getting back.
I have set EK2_IMU_MASK to 3 - actually I never changed this parameter in the past.

I did a step by step disabling of all accessoires. I found 2 issues.

Problem 1. With Harmonic Notch filtering enabled, I can reproduce the erratic behaviour in FlowHold mode and the occasional drifting in Loiter until the copter becomes almost uncontrollable as described in my original post above. As soon as I disable the Harmonic Notch the copter behaves fine in Loiter and FlowHold works as it should. My Harmonic Notch settings are:
INS_HNTCH_MODE=1
INS_HNTCH_FREQ=80
INS_HNTCH_BW=20
INS_HNTCH_REF=0.3
INS_HNTCH_ENABLE=1

I could not yet drill down to the reason why the Harmonic Notch filtering is causing those issues. I have it enabled on a Solo in parallel where it works fine. However, with the exception of a LidarLite I have no additional accesories attached to my Solo.
I am guessing right now that there might be a conflict with the Optical Flow enabled.

Problem 2. I am logging simultaneously on the Pixhawk SD card and on my companion computer via APsync with LOG_BACKEND_TYPE = 3
As I found out today, the logs on the companion computer contain only a subset of the log values recorded on the Pixhawk. That explains the strange log values recorded on the CC.
Interestingly, As soon as the Harmonic Notch filtering is enabled and INS_LOG_BAT_MASK is set, the number of logged parameters drops even further. That explains why I did not get any logs of the VIBE and OF parameters in my original posting above. On the Pixhawk, all data are recorded correctly.
What can cause the loss in recorded parameters on the CC? Its a TX2, I had only dflogger and mavlink_router running for this test - so its certainly not a problem of lacking CPU power.

Here is the log taken from the Pixhawk SD card with HNTCH disabled - everything worked fine:
https://1drv.ms/u/s!AnKeW8KMoCcyyCUaCBoeGHR2kerv?e=jpSZQ4

Here is a log taken from the Pixhawk SD card with HNTCH enabled, - which includes the erratic FlowHold behaviour, increasing drift after a few minutes of flight, and the crash when I tried to land at the end.
https://1drv.ms/u/s!AnKeW8KMoCcyyCZtMNme7zelIG8l?e=q1nSRy

Ok, so I definitely think there is some kind of interaction between the harmonic notch and flowhold here. The notch should definitely not make things worse, it’s just a bit of extra filtering.
In you notch log I see the same roll and pitch oscillation that you posted earlier - this is definitely why you can’t control the coper - I also see a large oscillation in the flowhold data, so I am wondering if there is some kind of feedback loop going on. Will need @Leonardthall 's help to diagnose. One thing you could try is to reduce the flow hold filter frequency.
One question - is 0.3 your hover throttle?

It has always bothered me that flowhold pushes the unfiltered gyros into the driver, but I do not understand flow hold well enough to know why that might be. I suspect that this may be the problem however.

This looks like you may be maxing out your processor. That would cause you to lose logging and maybe your telemetry would slow down.

Maybe you should try turning off the second EKF to save some processing.
EK2_IMU_MASK = 1

I have similar flowhold behavior with these parameters.
You can see it at the end of this short video.
I have fewer swings if I put this:

FHLD_FILT_HZ 2

INS_HNTCH_ATT,15
INS_HNTCH_BW,100
INS_HNTCH_ENABLE,1
INS_HNTCH_FREQ,200
INS_HNTCH_HMNCS,3
INS_HNTCH_MODE,1
INS_HNTCH_REF,0.26

EDIT: insert correct INS_HNTCH param

Back to testing soon with a new body and the Location one, Edit I was able to get a good lock without much drop. Update: still have issues,

Thanks Andy. I just re-examined my logs - yes INS_HNTCH_REF should rather be 0.36 than the 0.3 I used to fly. INS_HNTCH_FREQ according to the graphs in fftui should rather be 90-95Hz than the 80Hz I originally have set. Wouldn’t it make more sense to slightly increase rather than to reduce the frequency then?

@Leonardthall I compared my logs for Harmonic Notch enabled and disabled - the values for NLon and NLoop are in equal ranges - at least I cant see a significant difference in CPU load there.
Max NLoop is 4000, and my max NLon is around 400 and spikes up to 500 - if I understand the explanation in the wiki correctly, the percentage of slow running loops is then between 10-12.5%, which is below the 15% threshold as described in the Wiki - Leonard, is my understanding correct?

Unfortunately I am currently unable to do flight testing due to very bad weather forcasted for the next days at my location.

Yes you should try and set these parameters as accurately as possible - but I guess my point remains the same, if you can fly without the harmonic notch it shouldn’t get worse with it on.

Are you able to try some of the suggestions of reducing EKF cores/IMUs to reduce load to see if that helps? Also I’m interested to know if flow hold is a CPU hog. If you turn it off does that make things better?

Yes, your understanding of the Wiki is correct. However it does not make sense that the harmonic notch impact Loiter or Flow hold, but not Stabilize or Alt_Hold, though the effect of the filtering it is doing on the IMU. So we need to consider other ways this could happen.

@andyp1per
Hi Andy, weather finally permitted to fly today.
I can reconfirm that there is a dependency between Harmonic Notch and Optical flow, independent of the number of EKF cores enabled as per @Leonardthall suggestion.
Also with EK2_IMU_MASK = 1 the copter becomes increasingly unstable and starts drifting once Harmonic Notch and Optical Flow are both enabled. See the log here:
https://1drv.ms/u/s!AnKeW8KMoCcyyR5FFM4PjLmvjO5d?e=pN2UM1

With OF enabled and Harmonic Notch disabled, FlowHold and all other flight modes are ok. And vice versa, with OF disabled and Harmonic Notch enabled, everything is fine, too. With Harmonic Notch, stability is impressive, even in very hefty winds today the copter was flying crisp, precise, and rock stable. Here is a log with Harmonic Notch on, one EKF core and OPtical Flow disabled: https://1drv.ms/u/s!AnKeW8KMoCcyyR_auhElIEfxM6Di?e=pcML3C

I could not determine any difference with one or two EKF cores enabled. The value for NLon decreases by roughly 50% when I set EK2_IMU_MASK to 1, but this has no impact on the original issue with Harmonic Notch and Optical Flow.

Would be great if you guys could have a closer look into this.
In the meantime, I will disable Harmonic Notch on my copter since I require Optical Flow.

I don’t see the same instability I saw in the previous log. The altitude is way off though:

Whereas with just the notch altitude is stable:

The CPU load is reasonable at 70% as well. I do notice that the vertical position innovation (IPD) goes ballistic however:

this kind of looks like what vibes look like, but without the vibes. I suspect you need someone like @priseborough to diagnose because this is beyond me :frowning: