Copter EKF altitude estimation error due to heavy uart communication load

Hi,

These days I’m experimenting with Pixhawk2 + F450 + Here+ V2 RTK GPS. Since my application requires multiple quadcopters, I am using usercode function.

The issue is, sometimes copter suddenly loses its altitude and crashes. I found out that the EKF altitude innovation has diverged and this confirms that the EKF altitude estimation went wrong.

These are two reasons why I think heavy communication leads to the wrong EKF altitude estimation

  1. Turning off dataflash log improves performance

My application uses GPS1(Ublox M8P), telem1(Xbee, mission planner communication for RTK GPS), telem2(Another Xbee, for customized takeoff, loiter, arming, etc.), and dataflash for the uart. When the take off starts, the dataflash log also starts and this significantly reduces GPS1 communication performance, thus sometimes communication delay of gps data is increased when too much is going on the buffer.

Currently I disabled dataflash log for temporary solution and the lagging significantly reduced. Also, reducing the buffer size from 16 to 4kbyte resulted better performance.

  1. Downloading dataflash log using MAVlink results wrong estimation
    When trying to download the dataflash log by mavlink, the altitude divergence also occurs, thus I think the heavy uart communication significantly degrades the EKF performances.

As far as I know, from the EKF2 code,

in void NavEKF2_core::SelectVelPosFusion(),
readGpsData() has some problems. This does not return valid time_ms when the uart load is heavy.

Since I need dataflash log for the analysis, disabling the function could not be a proper solution. Is there anybody who was able to get stable GPS uart data in heavy communication environment? Or is there an alternative communication solution without burdening the ring buffer? Also my copter does not have vibration problem. Any suggestion is welcome.

1 Like

Try a faster FC. Pixhawk4 / 4Mini or a Kakute F7. Maybe Beaglebone Blue.

Is this on Muttx or on ChibiOS ?

Hello,

Check that your addition to user code isn’t stopping the main loop for too long …
Normally, you shouldn’t dowload dataflash log in flight as they are quite heavy and you don’t have much bandwith on radio.

Which copter firmware version are you using ?

Yes, that is also good option, but I would like to stick to pixhawk2 since changing hardwares take a lot of cost and time.

It’s based on nuttx. Could ChibiOS be a good alternative? I heard that it is not stable as nuttx

It’s ArduCopter V3.6.6-rc1. I’ll check my code again thanks.

If it is nutTX then it is understandable… change to ChibiOS. you will be surprised.
Regarding ChibiOS, it is stable and NutTX even got removed from 3.7… so change change change.

You should not be able to download dataflash logs in-flight. Are you
able to?

  1. The dataflash download lag occurs even if it is not in-flight.
  2. Also while in-flight, I used function gcs().send_text in usercode to check gps delta delay
    My point was that in-flight gps delay occurs due to dataflash log function ‘inside’ pixhawk. Did not download dataflash logs in-flight…

Took about two days to build cause wiki was outdated. I’ve posted the issues and my solutions Cygwin /eclipse build issues

I’m already surprised by its fast build speed… but kinda disappointed with not supporting lidar lite v3 pwm mode thus I have to change lines to i2c. Anyway thank you for the suggestion.

Do you have a branch somewhere I could test with?

I only use local repository so I do not have branches. Since I’ve decided to move on to ChibiOS, nuttX is abandoned for a while. Anyway thank you for your concern :slight_smile:

is the multrotor with this problem equipped a camera gimbal? maybe a Storm32 control board?

Any additional information here? I have been noticing similar behavior during takeoff. The vertical EKF estimation seems to have larger than normal errors just as the vehicle is armed. The strange thing is that it can not be replicated in SITL simulation so your theory on serial overhead is reasonable. Has anyone else seen this behavior?

So after some additional investigation I have found that on the first arming after a power cycle/pixhawk reboot, there is a very consistent 0.5 m/s vertical velocity error which leads to a position error of between 1-1.5 meters which is not reflected in barometer altitude but is present in fused altitude output. This is incredibly repeatable and unavoidable regardless of uart comms load. On all successive arming events, the vertical velocity error is smaller but still present. Interestingly enough, if I turn off all serial interfaces other than gps port, these successive vertical velocity errors on arming are smaller/non-existent.

I have not had a chance to mess with the dataflash logging since I need it in order to analyze results, which leads me to wonder how you can tell if it is better after turning off dataflash logging if you cant verify through log analysis.

I will post some log images when I get a chance but it is really strange and appears to be independent of nuttx vs chibios. This seems to be present on either firmware version.

I am experimenting with the same fw version. 3.6.6

Probably I’m having the same issue.
Have you reported this on github?

no camera gimbals are used. It’s based on pixhawk2 cube and copter 3.6.6-rc1

I checked the gps delta time by using console mavlink when the dataflash log was turned off. There were no ekf divergence and also five times of test flight showed no drop-down due to sudden ekf altitude divergence when the log was off. Of course this is trial-and-error so it may not be sufficient to prove the non-existence of the wrong estimation.

Also as you have said this appears to be independent of nuttx vs chibios. While I was working on ChibiOS, the same ekf altitude divergence problem occurred. This time the overhead was not from uart communication but the sd card read write problem, even though the usercode is almost same. I’ve found out that too much repetition of open function (POSIX based file read write function) due to read function(also POSIX) fail causes overhead. By using open function only once and disregarding the failed read data in that particular sample did prevent computational costs. The experiment flight again had no drop-down issues. I’m guessing that ChibiOS somehow messes up with sd card dataflash logging file read write since the it does not use proper POSIX library.(I’ve had no file read function failures in nuttX).

I’m not expert on RTOS thus benchmark testing cannot be done. I can only inspect the results and guess about it, but still it’s very fishy that overhead cause by either too much communication(nuttX uart communication) or high computational load(ChibiOS file I/O related functions) leads to the ekf divergence.

If you are also using usercode, why not try to find the main cause of the high computational cost? I’m pretty sure that the stable version with no usercode will perform nicely, if hardwares and parameters are set correctly.

nope, but if others have same issues repeatedly, the issue should be reported.