Arduplane 4.0.9 froze in the middle of flight

Hello,

My flight controller is matek 743 wing using APMVERSION: ArduPlane V4.0.9. It froze in the middle of the flight today. The plane took off and cruised in cruise mode for the first several minutes without any issue and then the flight controller seems frozen and the ground mission planner showed disarm and InternalError 0x800. But the onboard video shows the motor still running and out of control state. The plane turned around and kept losing altitude until landed in the field.


108 109 110

After I retrieved the plane and tried to connect the FC to the computer. The FC is in a frozen state and cannot connect to missionplanner. I downloaded the logs by an sd card reader. The logs were divided into multiple 794 kb small log files except for the first log which is the successful part of the flight.

H7 chip is known for frozen. And 4.0.9 has the fix to prevent this.

But not sure what triggered the FC into this dangerous state on this flight. I’m just lucky the plane landed in an empty field and didn’t damage any property or hurt anyone today. But I do want to root cause this before reimage the FC and next flight.

From search, The log file that began after the reboot contains Internal errors detected (0x800) which indicates the FC rebooted by watchdog. Not sure why the FC reboot or reset in the middle of flight https://github.com/ArduPilot/ardupilot/issues/11296.

Could someone please help investigate the bin logs? Thanks in advance.

Video:
FC frozen state: https://youtu.be/tPAZQEm7naQ
Onboard video and OSD: issue started at 4:19 https://www.youtube.com/watch?v=lBVOksKg9yw&t=259s

Here is the log:
https://drive.google.com/file/d/1bPdmr7qJLP6IdthKUFF54A2ZAMzpb-WZ/view?usp=sharing

pbarker@bluebottle:~/rc/ardupilot(pr/decode-watchdog-fix)$ ./Tools/scripts/decode_watchdog.py "WDOG {TimeUS : 29876615, Tsk : -3, IE : 2048, IEC : 1, IEL : 218, MvMsg : 0, MvCmd : 0, SmLn : 0, FL : 122, FT : 3, FA : 136073120, FP : 59, ICSR : 4196355, LR : 135416151, TN : stor}"
    T            Scheduler Task:           -3: Waiting for sample
   IE       Internal Error Mask:         2048: ?????
  IEC      Internal Error Count:            1: 1
  IEL       Internal Error Line:          218: 218
   MM           MAVLink Message:            0: [None]
   MC           MAVLink Command:            0: [None]
   SL            Semaphore Line:            0: Not waiting on semaphore
   FL                Fault Line:          122: ?????
   FT                Fault Type:            3: HardFault
   FA             Fault Address:    0x81c4fa0: ?????
  FTP     Fault Thread Priority:           59: ?????
FICSR        Fault ICS Register:    0x4196355: [Below]
         VECTACTIVE:   3  (Hard fault)
          RESERVED1:   0 
           RETOBASE:   1  (no (or no more) active exceptions)
        VECTPENDING:   0  (Thread mode)
          RESERVED2:   0 
         ISRPENDING:   1  (Interrupt pending)
          RESERVED3:   0 
          PENDSTCLR:   0  (WO clears SysTick exception)
          PENDSTSET:   0  (SysTick not pending)
          PENDSVCLR:   0  (WO clears pendsv exception)
          PENDSVSET:   0  (SysTick not pending)
          RESERVED4:   0 
         NMIPENDSET:   0  (NMI not pending)
          FLR Fault Long Return Address:    0x8124957: ?????
   TN               Thread name:         stor: ?????
pbarker@bluebottle:~/rc/ardupilot(pr/decode-watchdog-fix)$ 

Thanks, Peter for decoding this watch dog log. How can we interpret this? Does it show where is the root cause?

And forum search shows similar issue happened before. RTL cause watchdog reset plane 4.1.0-dev

@tridge could you take a look at this repro? Let me know if you need any log or information to root cause.