Arduplane reboot in air 4.2.0 with CSRF/ExpressLRS

Hi Everyone,

While I was flying today my ardupilot rebooted while I was in the air. Here is my setup:

Vehicletype ArduPlane

Firmware Version V4.2.0

Firmware Hash 44735215

Board: Mateksys F765-WSE

Controller: ExpressLRS (UART)

GPS: M8N (UART)

Telemetry: ESP8266 flashed with pixracer wifi bridge (UART)

3s battery 2200mah,

ZOHD Nano Talon Frame

DJI FPV Goggles with telemetry enabled (UART).

So I took off in auto with a throw (as always) touched the ground while taking off (rough start), switched to FWBA after mission (take off and RTL) was complete and started doing circles around a big empty grass area. While I was turning, I think a failsafe event triggered due to remote controller connection loss. When that happened I saw a brief “CSRF” text on my DJI goggles and then all telemetry (on DJI MSP + Mavlink telemetry to Qgroundcontrol) went offline and I lost controls. I heard the ardupilot reboot sound as my plane glided down. I had no damage.

I searched for a bit about this issue and saw couple of others experiencing this. I highlighted the devices that I have with UART since I suspect that might be causing the issue.

Here are other threads about this:

In-flight Watchdog reboot on 4.0.4 and mro zero f7 (this got me suspicious about uart)

I’m attaching this as a note to myself, I found watchdog lines in my log 9

Here are my logs:
https://drive.google.com/drive/folders/17izezjtvBIW1Z-bHOAeVzKEvdd0THIJi?usp=sharing

there are two, log 8 is the one where I take off and in the end autopilot resets,
log 9 is after the reset.

I reviewed the video from my DJI Goggles as well, interestingly the video cuts just before the crash. I looked for power issues as well but my battery voltage in the graphs look fine, I also watched the plane glide live. My SD card was too slow for recording so video might be chopped because of that (too much action === bigger video size === more write speed, sd card failed to write)

Let me know what you all think.

Hi @umbcorp - 4.2 has some advanced diagnostics which we could use to get to the bottom of this.

Could you grab the @SYS/crashdump.bin file using mavftp, please?

Alternatively, arm the vehicle, wait a few minutes, disarm the vehicle and then send the log through.

Please do not flash the firmware, as ArduPilot has almost certainly stashed away the diagnostic information in the board’s flash - but that is overwritten when a new firmware is uploaded.

Peter

@umbcorp can you see if you can reproduce this in a bench test? Use exactly the same firmware and setup, but without the propeller and see if you can find a way to make the issue happen. If you can then we can start working on some patches to narrow down the issue.
I had a look at your logs and the crash appears to happen when it runs code in a section of memory that only has data in it. That makes me think we may have memory corruption happening. I did a quick review of the ExpressLRS code and couldn’t find a cause, so it would be really helpful if you could find a way to reproduce.

small correction, the file is called @SYS/crash_dump.bin, and if you can get that file then it would be extremely helpful. As Peter says, the process of flashing a firmware deletes this crash dump, so please don’t change firmware

I got the crash_dump.bin, uploaded here also uploading it to the google drive as well.

google drive crash_dump

crash_dump.bin (44.7 KB)

download button didn’t work (it downloaded a file that is 0kb) at MissionPlanner, I had to use Download Burs option.

I changed one parameter after this flight:

which is BRD_ALT_CONFIG to “1”

Mateksys page says I should do that for CRSF.

Should I revert it back to zero and do my bench tests like that?

Thanks for looking into this!

thanks for the crash_dump.bin, very useful!
we’ve narrowed down the crash to this function:
AP_RCProtocol_CRSF::write_frame
@andyp1per is the maintainer of that code and is having a look.
If you could try and reproduce in a bench test that would be great. Then when we have a fix we can be a lot more confident of the fix if you can no longer reproduce after loading the fixed firmware.

I have produced a fix based on my discussion with @tridge Check for bad frames in CRSF decoding by andyp1per · Pull Request #20925 · ArduPilot/ardupilot · GitHub be great if you could verify if this helps or not

First I’ll try to reproduce the problem without upgrading software.

I will do the following:

  1. Revert BRD_ALT_CONFIG to 0

  2. Fake fly the plane in my living room without propeller on the ground in manual mode.
    3)Try to make it loose rc receiver connection multiple times (put my rc in the fridge or walk away with it)

  3. Maybe unplug and plug the receiver during the fake flight? the issue @andyp1per patched seems to be on bad CRSF frames with failing CRC, so maybe if i introduce some instability to UART connection that might make bad frames?

I will let you know about my progress.

@umbcorp here is a version of fw 4.2.1 with the fix from @andyp1per - can you confirm this works with ExpressLRS?
http://uav.tridgell.net/tmp/plane-MatekF765-SE-4.2-CRSF-fix.apj

I just did some bench testing, couldn’t manage to trigger a reboot. I will do more testing, I need someone to leave the house with the remote. I did the following:

Arm on the bench without propeller, switch to manual and fake fly (plane sitting on bench with a little throttle, plane is armed, motor is running)

  1. Turned off the remote multiple times, autopilot successfully executes fail-safe events
  2. Tried disturbing the signal between the remote and the receiver, sadly expresslrs works in the following conditions:
  • In the microwave (microwave is not on)
  • In the oven
  • In the freezer
  • In the laundry machine

I failed to disrupt the connection with the receiver, I will ask someone to leave the house with my remote tomorrow.

  1. Removed UART TX pin off the receiver from the autopilot board. Failsafe executes successfully no reboot. Also played with UART RX, but that has no impact (BRD_ALT_CONFIG is set to 0, so my CRSF telemetry doesn’t work)

I will do more testing tomorrow:

  • Empty a battery fully on the bench
  • Disrupt the radio connection with distance

I will also confirm the fix you compiled for me!

1 Like

It’s unsurprising that it’s difficult to reproduce - this is the first CRSF watchdog in a long time and many people are flying with it

1 Like

indeed, given what we found out about the issue it will be a difficult one to reproduce. The key now is to ensure ExpressLRS still works with the firmware change to fix the bug. If you can confirm that then I’ll include it in a new beta for 4.2

Mission Planner Message Output

6/10/2022 5:23:37 PM : u-blox 1 HW: 00080000 SW: 2.01 (75350)
6/10/2022 5:23:35 PM : GPS: u-blox 1 saving config
6/10/2022 5:23:25 PM : ELRS: RF Mode 7, telemetry rate is 16Hz
6/10/2022 5:23:21 PM : GPS 1: detected as u-blox at 230400 baud
6/10/2022 5:23:20 PM : ELRS: RF Mode 7, telemetry rate is 45Hz
6/10/2022 5:23:19 PM : RCOut: PWM:3-12 NeoP:13
6/10/2022 5:23:19 PM : MatekF765-SE 003C003E 3056500F 20363547
6/10/2022 5:23:19 PM : ChibiOS: 93e6e03d
6/10/2022 5:23:19 PM : ArduPlane V4.2.1 (f614862d)
6/10/2022 5:23:18 PM : RCOut: PWM:3-12 NeoP:13
6/10/2022 5:23:18 PM : MatekF765-SE 003C003E 3056500F 20363547
6/10/2022 5:23:18 PM : ChibiOS: 93e6e03d
6/10/2022 5:23:18 PM : ArduPlane V4.2.1 (f614862d)
6/10/2022 5:23:18 PM : RCOut: PWM:3-12 NeoP:13
6/10/2022 5:23:18 PM : MatekF765-SE 003C003E 3056500F 20363547
6/10/2022 5:23:18 PM : ChibiOS: 93e6e03d
6/10/2022 5:23:18 PM : ArduPlane V4.2.1 (f614862d)

I can control + receive telemetry using ExpressLRS with the supplied firmware. Looks fine on the bench

1 Like