While I was flying today my ardupilot rebooted while I was in the air. Here is my setup:
Vehicletype ArduPlane
Firmware Version V4.2.0
Firmware Hash 44735215
Board: Mateksys F765-WSE
Controller: ExpressLRS (UART)
GPS: M8N (UART)
Telemetry: ESP8266 flashed with pixracer wifi bridge (UART)
3s battery 2200mah,
ZOHD Nano Talon Frame
DJI FPV Goggles with telemetry enabled (UART).
So I took off in auto with a throw (as always) touched the ground while taking off (rough start), switched to FWBA after mission (take off and RTL) was complete and started doing circles around a big empty grass area. While I was turning, I think a failsafe event triggered due to remote controller connection loss. When that happened I saw a brief “CSRF” text on my DJI goggles and then all telemetry (on DJI MSP + Mavlink telemetry to Qgroundcontrol) went offline and I lost controls. I heard the ardupilot reboot sound as my plane glided down. I had no damage.
I searched for a bit about this issue and saw couple of others experiencing this. I highlighted the devices that I have with UART since I suspect that might be causing the issue.
there are two, log 8 is the one where I take off and in the end autopilot resets,
log 9 is after the reset.
I reviewed the video from my DJI Goggles as well, interestingly the video cuts just before the crash. I looked for power issues as well but my battery voltage in the graphs look fine, I also watched the plane glide live. My SD card was too slow for recording so video might be chopped because of that (too much action === bigger video size === more write speed, sd card failed to write)
Hi @umbcorp - 4.2 has some advanced diagnostics which we could use to get to the bottom of this.
Could you grab the @SYS/crashdump.bin file using mavftp, please?
Alternatively, arm the vehicle, wait a few minutes, disarm the vehicle and then send the log through.
Please do not flash the firmware, as ArduPilot has almost certainly stashed away the diagnostic information in the board’s flash - but that is overwritten when a new firmware is uploaded.
@umbcorp can you see if you can reproduce this in a bench test? Use exactly the same firmware and setup, but without the propeller and see if you can find a way to make the issue happen. If you can then we can start working on some patches to narrow down the issue.
I had a look at your logs and the crash appears to happen when it runs code in a section of memory that only has data in it. That makes me think we may have memory corruption happening. I did a quick review of the ExpressLRS code and couldn’t find a cause, so it would be really helpful if you could find a way to reproduce.
small correction, the file is called @SYS/crash_dump.bin, and if you can get that file then it would be extremely helpful. As Peter says, the process of flashing a firmware deletes this crash dump, so please don’t change firmware
thanks for the crash_dump.bin, very useful!
we’ve narrowed down the crash to this function:
AP_RCProtocol_CRSF::write_frame @andyp1per is the maintainer of that code and is having a look.
If you could try and reproduce in a bench test that would be great. Then when we have a fix we can be a lot more confident of the fix if you can no longer reproduce after loading the fixed firmware.
First I’ll try to reproduce the problem without upgrading software.
I will do the following:
Revert BRD_ALT_CONFIG to 0
Fake fly the plane in my living room without propeller on the ground in manual mode.
3)Try to make it loose rc receiver connection multiple times (put my rc in the fridge or walk away with it)
Maybe unplug and plug the receiver during the fake flight? the issue @andyp1per patched seems to be on bad CRSF frames with failing CRC, so maybe if i introduce some instability to UART connection that might make bad frames?
I just did some bench testing, couldn’t manage to trigger a reboot. I will do more testing, I need someone to leave the house with the remote. I did the following:
Arm on the bench without propeller, switch to manual and fake fly (plane sitting on bench with a little throttle, plane is armed, motor is running)
Turned off the remote multiple times, autopilot successfully executes fail-safe events
Tried disturbing the signal between the remote and the receiver, sadly expresslrs works in the following conditions:
In the microwave (microwave is not on)
In the oven
In the freezer
In the laundry machine
I failed to disrupt the connection with the receiver, I will ask someone to leave the house with my remote tomorrow.
Removed UART TX pin off the receiver from the autopilot board. Failsafe executes successfully no reboot. Also played with UART RX, but that has no impact (BRD_ALT_CONFIG is set to 0, so my CRSF telemetry doesn’t work)
indeed, given what we found out about the issue it will be a difficult one to reproduce. The key now is to ensure ExpressLRS still works with the firmware change to fix the bug. If you can confirm that then I’ll include it in a new beta for 4.2