Internal errors 0x100000 I:921 flow_of_ctr Twitching motion in loiter during forward flight

Axel1 · November 23, 2022, 12:33pm

Good news!

Disabling the third EKF and disabling the batch sampler did it! I n longer have flow_of_ctr internal error and the number of long loops have been cut in half on the plot.

Log: https://drive.google.com/file/d/1ayH_Davq3E-Kyzfo_7OXmThYUZTvHbUT/view?usp=sharing

I have some questions though: Will I be able to use autotune for example, or other such features like using gps for altitude or will that put to much load on the processor?

And also, I took a look at the RATE values in the log. All the out values should be under 0.1 if I recall right. I believe you wrote that at some point in some topic. Mine is a bit over 0.1 and I thought I could lover my rate filters and gyro filter (rate filters at 13 and gyro at 26 as of now) to maybe 11 and 22 or even 10 and 20 if needed. I wanted to ask you about this before I start to play around with it.

Thanks!

Axel1 · November 23, 2022, 4:29pm

@rmackay9

Maybe this is interesting for you as well, as an aid to solve this “issue”.

Mine seems jut to be that my FC is a bit old

xfacta · November 23, 2022, 9:20pm

Dont change the Gyro and Accel filters from what they are now. You can still run Autotune just fine, it does increase some logging while it runs but that shouldnt be a problem, that’s been happening for years.

I dont think using GPS for Alt will affect the load much, if any - the same data is still collected.

If load (or long loops) is still an issue you can set INS_USE3,0

Leonardthall · November 23, 2022, 11:10pm

Yeh, I have not looked at the code to work out exactly what they do over EK3_IMU_MASK. The EKF mask is going to be the main one though.

xfacta · November 24, 2022, 12:16am

Try changing
EK3_IMU_MASK,7
to
EK3_IMU_MASK,3

That should sort it out

X1aero1 · November 25, 2022, 8:38am

I’m in the same boat it seems with my Cube Black. Problem is my EK3_IMU_MASK is already set to 3. I’m running a throttle based notch filter and made sure the INS_LOG_BAT_MASK is set to zero. The only thing different going from 4.2 to 4.3 is my INS_FAST_SAMPLE was set to 1 instead of the default 5. Does that even make a difference?

Axel1 · November 25, 2022, 8:53am

@rmackay9
@Leonardthall

Could tjis be an issue with the batch sampler?

I also turnee mine off when it started working

Leonardthall · November 25, 2022, 11:14am

Can you provide a log please.

Leonardthall · November 25, 2022, 11:15am

That may be part of the issue.

Axel1 · November 25, 2022, 11:42am

@Leonardthall

I did an autotune with no good tesults. Maybe you could take a look at the log later of you got the time.

But for now, I could load the privious parameters which had a decent but now perfect tune and test with the batch sampler on and off a couple of times if you would like?

Leonardthall · November 26, 2022, 3:17am

I have been looking at this further and I suspect cube black’s will need to set:
EK3_IMU_MASK 3 or 5 based on redundancy considerations (2 IMU’s)
INS_ENABLE_MASK 3 or 5 (2 IMU’s)
INS_USE 1 (not sure if they do anything any more)
INS_USE2 0 or 1 (not sure if they do anything any more)
INS_USE3 0 or 1 (not sure if they do anything any more)
INS_LOG_BAT_MASK 0
INS_FAST_SAMPLE 0
Limit harmonic notch’s to two harmonics
Don’t use any INS_HNTCH_OPTS
FFT_ENABLE 0
INS_GYRO_RATE 0
SCHED_LOOP_RATE 400

I will need to get some more input from the other Dev’s on what to suggest.

Another setting here that I would be interested to see if it helps is:
INS_FAST_SAMPLE 0

X1aero1 · November 26, 2022, 10:17am

I was wondering about the INS_FAST_SAMPLE as well. I’m going to give it a try tomorrow. I thought updating to 4.3, adding a notch filter and getting a super tight autotune was going to be great. This 6-year-old mapping drone has always flown flawlessly over thousands of missions. That’s boring. Now it’s full-on crazy lol! I’m lucky I got it back on the ground in stabilize.

Axel1 · November 26, 2022, 8:07pm

Hi,

I was out test flying today again,

I use EKF mask 3
INS_USEx won’t change even after reboot.
LOG_BAT_MASK 0
notch OPTS 0
LOOP_RATE 400

With these settings it flew well with no internal errors.

I also did a test flight with fast sample disabled if you wanted to see a log for that.

Log, fast sample off:
https://drive.google.com/file/d/1Fhlfg1HkbX4oYbzIxDj_A2nuPmtG9Q39/view?usp=sharing

Since autotune didn’t work for my aircraft, I tried a manual tune this morning. It flew well after, there was no wind so I’ll have to test that to, but could you maybe take a quick look at my log and see if I could improve my tuning somehow. Even if it won’t be noticeable during normal flying, I really want to learn as much as possible about tuning and ardupilot! If you got the time that is!

Tuning log:
https://drive.google.com/file/d/1TyUj4pRZocslbdlYGpczvwjCvokxxOnF/view?usp=sharing

The log is for a quite long flight where I take off and land a couple of times. On the last flight I’m done tuning so only that portion of the log is important if you would decide to take a look at it.

Edit: I tried a little investigating myself, looking at the ATT plots on roll and pitch. I can see that they are tracking quite well although a slight overshoot on both. A bit worse on roll. Especially on small aggressive inputs. I would increase the D gain on roll and pitch just a tiny bit. Maybe slightly more on roll. Is this the right thing to do or should I decrease P?

Thanks!

Also, if there is any other test you would like me to do regarding the internal error issue or anything else I would gladly do so!

Leonardthall · November 27, 2022, 1:32am

Thanks for your help @Axel1 !

I don’t want to turn this thread into a tuning support thread but your tune looks good.

We have put over load handling into the scheduler to ensure all processes continue to run when the processor can’t keep up. However, we have not put much in the way of warning messages or instructions in place on how to handle overloaded processors. As a result we are seeing Cube Blacks with 12.5% of their loops running as slow as 1/3 normal rate. The average loop time is something in the order of 112% to 130% normal rate from what I can work out.

When I designed the controllers it was done with the assumption that the update rate would be close to the specified value (400 Hz normally). I assumed incorrectly that any timing issues would be a temporary problem that would be rectified by the user. Now the kicker, this slow down directly scales the D terms and I terms. So a 30% increase in loop time will increase the D term by 30% and reduce the I term by 30%. This is enough to move many aircraft to near oscillation. Further this is random so it is also adding noise to the outputs.

Finally all the navigation updates are running slower so from the perspective of the position controller the aircraft is moving 30% faster than it should be. For example, it is reporting the correct velocity but on each time step moving further than the position controller expected because the time step is larger than the position controller is using. This is also true for the velocity to acceleration relationship.

From what I can see there are three parts to the solution to this problem:

Users need to be mindful to keep CPU load to a reasonable level by limiting the features being used on lower boards.
We need to provide better reporting and instructions for users to help them manage that load.
I need to rewrite the PID objects, internal filters, Attitude Control, Position Control, and Waypoint Navigation to update based on the measured loop time.

Some work to do…

X1aero1 · November 27, 2022, 2:21am

I made those change and tested it in a nice controlled environment. 7 solid hours of mapping tailing pond wall. On average, every leg ascends and descends about 300m within 1km. Lots of up and down. Today was super special, constant 17km winds gusting to 25. Oh, and -5c, too much fun. Back to its boring flights, not a hint of crazy town😂 Thanks Leonard!

Axel1 · November 27, 2022, 9:43am

Oh damn…

Would my cube be affected in a way that my pids are wrong at the momentet. I have limited the processor as you suggested and the error is gone.

Am I still affected by the D and I term getting messed up?

Also, this is only for processor weak FCs, right? I have a cube orange built comming up, will that one be affected?

Thanks!

Leonardthall · November 27, 2022, 10:26am

Hi @Axel1,

Your last log was fine. Your loops needed very little extra time to ensure all tasks were completed.

The scheduler was only hinting at needing extra time. On average 1 in 40 loops were long. An interesting thing to note that your longest loops are still about 4000.

So you are effectively seeing no gain changes.

10 Hz loops include:

update_batt_compass
arm_motors_check
auto_disarm_check
auto_trim
update_altitude
ekf_check
check_vibration
gpsglitch_check
landinggear_update
lost_vehicle_check
avoidance_adsb_update

I wonder if any of them have had a significant update for 4.3…

Axel1 · November 27, 2022, 10:36am

Maybe @rmackay9 knows of there was any changes to the 10hz loops.

Tell me if there is anything else that needs to be tested!

Axel1 · November 28, 2022, 11:54am

You said you would rewrite the PID loops etc, that sound painful and time consuming so I would assume that’s going to take a while.

When you are finished, will there be any sort of announcement somewhere since that’s quite a big change?

Thanks!

Leonardthall · November 28, 2022, 11:58am

Not as bad as it sounds. I have been planning this for a long time and wrote them with this in mind. So it is mostly checking a few things to make sure there are no hidden problems. I got started on it straight away and it looks like there are no compromises, only benefits.

github.com/ArduPilot/ardupilot

Move controllers and PID objects to a real time update system

ArduPilot:master ← lthall:20221127_PID_Real_Time

opened 08:35AM - 27 Nov 22 UTC

lthall

+397 -402

This PR changes the PID objects, internal filters, Attitude Control, Position Co…ntrol, and Waypoint Navigation to update based on the measured loop time rather than an aproximated regular time. This addresses serious control issues that present themselves when the autopilot becomes overloaded and the updates drop below the expected rate. This PR has the added benifit of reducing noise caused by impefectly regular updates at all levels of the control system. This noise is caused by incorect calculation of intergrals and derivitives due to innacurate values of time steps. Filter performance is improved over all operating conditions resulting in more constant magnitude and phase delay and lower noise. This PR should also remove the flow of control error caused by overloaded processor while also removing the negative impact this reduction in update rate has on control and navigation.

It will take some time to get in but I would expect it to be in master by the new year.