Crash with 4.2.0-beta and 4.3.0-daily (bdshot)

Loaded 2022-03-30-08:03/omnibusf4pro-bdshot version after complete wiping FC and bluejay ESCs ARM OK.
That’s short version, here is more details:

Before it I did armed state thread dump on Feb 22nd version having GPS fix. I put disarmed copter under open sky, connected to mavproxy and waited for the fix. Ofter a minute FC at once rebooted (could be that GPS just got fix). There will be some logs - ask if you need them.

Then I armed and did thread dump: (daily build Feb 22nd 2022)
ISR PRI=255 sp=0x20000000 STACK=1256/1536
ArduCopter PRI=182 sp=0x20000600 STACK=5384/7168
idle PRI= 1 sp=0x20013568 STACK=144/352
UART_RX PRI= 60 sp=0x2001DA88 STACK=768/1104
OTG1 PRI= 60 sp=0x2001D1A8 STACK=264/656
monitor PRI=183 sp=0x20010B60 STACK=88/848
timer PRI=181 sp=0x20011D80 STACK=1488/1872
rcout PRI=181 sp=0x20011470 STACK=432/848
rcin PRI=177 sp=0x20010EE8 STACK=1064/1360
io PRI= 58 sp=0x200101D8 STACK=1448/2384
storage PRI= 59 sp=0x200117F8 STACK=864/1360
UART1 PRI= 60 sp=0x2001C7F8 STACK=248/656
UART3 PRI= 60 sp=0x2001BFB0 STACK=264/656
UART6 PRI= 60 sp=0x2001B8D0 STACK=272/656
UART4 PRI= 60 sp=0x2001B1F0 STACK=368/656
SPI3 PRI=181 sp=0x20019B88 STACK=896/1360
OSD PRI= 59 sp=0x20015430 STACK=896/1616
log_io PRI= 59 sp=0x20015C18 STACK=128/1656
I2C0 PRI=176 sp=0x20017B30 STACK=920/1360
SPI1 PRI=181 sp=0x10004010 STACK=720/1360
FTP PRI= 58 sp=0x1000E150 STACK=1440/2896

Then I wiped the FC board (installed iNAV) and after that installed daily build 2022-03-30-08:03.
Restored configuration (it was still complaining about unhealthy AHRS so I recalibrated ACC) and then it looks good ESCs ARM and all motors spin.
Not flight tested yet.

Also for this version I did armed state thread dump with GPS fix (2022-03-30-08:03):
ISR PRI=255 sp=0x20000000 STACK=1272/1536
ArduCopter PRI=182 sp=0x20000600 STACK=5264/7168
idle PRI= 1 sp=0x20013658 STACK=144/352
UART_RX PRI= 60 sp=0x2001DA88 STACK=768/1104
OTG1 PRI= 60 sp=0x2001D1A8 STACK=264/656
monitor PRI=183 sp=0x20010C10 STACK=480/848
timer PRI=181 sp=0x20011E30 STACK=1488/1872
rcout PRI=181 sp=0x20011520 STACK=432/848
rcin PRI=177 sp=0x20010F98 STACK=1024/1360
io PRI= 58 sp=0x20010288 STACK=1408/2384
storage PRI= 59 sp=0x200118A8 STACK=848/1360
UART1 PRI= 60 sp=0x2001C858 STACK=248/656
UART3 PRI= 60 sp=0x2001C010 STACK=264/656
UART6 PRI= 60 sp=0x2001B930 STACK=248/656
UART4 PRI= 60 sp=0x2001B208 STACK=368/656
SPI3 PRI=181 sp=0x20019C40 STACK=896/1360
OSD PRI= 59 sp=0x200154D8 STACK=896/1616
log_io PRI= 59 sp=0x20015CC0 STACK=144/1656
I2C0 PRI=176 sp=0x20017B28 STACK=920/1360
SPI1 PRI=181 sp=0x200142B8 STACK=720/1360
FTP PRI= 58 sp=0x1000DE18 STACK=1440/2896
I hope my testing can help some way!

1 Like

Ok, so looks like my change has fixed the BlueJay arming problems which is great! The log_io thread still looks low to me. So I think fly it and see how it goes. If you get a similar watchdog then I can do a custom build with an increase in stack for the log_io thread. Assuming you are ok with potentially crashing again …

1 Like

I’ll try it inflight as soon as weather allows (it should be raining here next few days).
For possible “crashing again” would be great if buzzer as “lost model beacon” is working.
Loading this build seemed to have effect on “alarm” function on a PWM output, but buzzer is still not working as needed - please see here.

I did few very short indoor flight tests with the version 2022-03-30-08:03 and well, it flies but once I had FC reboot just when I set arming swith to “ARM”.
It didn’t arm, but FC rebooted instead. Log just before reboot and log after reboot. Honestly I’m not much keen flying it for real, maybe only above thick cushion :roll_eyes:.

If you can do something which could make it more stable, great.

If you think that Omnibus F4 Pro is simply too weak for bdshot, I think I can live without RPM telemetry and can go to normal one way dshot. I hope this should be stable, right?
Thanks for all effort!
Roman

A fix went into master 9 hours ago. Can you retest it?

Yes, I can but need to wait for an automatic build. Still can see newest build from yesterday 2022-03-31 evening.

4.3.0 dev build from today Apr. 11th 2022 still contain the “watchdog” problem!
Finally weather here improved so I could test the latest build with already increased stack for the log_io thread.
During the first flight after only 45 seconds FC rebooted and copter crashed again. After the reboot there were again seen strange messages on the screen like last time:


Here is video from the crash and here is dataflash log.
This time was my copter connected via crossfire to the Mission planner, so if threre can be anything interesting in telemetry logs, please let me know what worth uploading.

Another minor problem happened then - I’m not sure if I should create a topic for it:
I decided directly on the field to reflash FC with 4.1.5 official release (without bdshot) to continue flying, but after that I was unable to arm with the following message: Invalid channel option (154).
message here
Even I “disabled” all remaining PWM outputs except the motors, several times rebooted, reverted all options to default and then re-loaded parameters (without buzzer on PWM channnel option 154) and Mission planner showed PWM 5-16 “disabled” I was unable to arm the copter again with the same message.

I loaded inav then and flew two batterys without a problem. I suppose that after returning arducopter back arming should work again. It seems that the PWM6_OPTION remained stored somewhere deep inside and I was unable to overwrite it with the option 0.

I’m sorry to hear that. You can completely erase the flash by loading plane (or inav) and then copter again - you would then have to load all of the parameters.

4.3.0-dev from today contains a number of fixes for this issue - at least on H7 - so it is troubling to hear that you still have an issue. @tridge may have some ideas about how we can diagnose further.

@rptacek please give us the tlog if you have one, and if there is a bin log from after the watchdog that as well.

I have telemetry logs from the yesterday crashed flight- tlog and rlog.
After reboot there are some errormessages in the tlog.
Nextr dataflash log wasn’t created - maybe because I didn’t arm again.

Still having problems to arm copter after downgrading to 4.1.5 official. I did full chip erase, but after recovering settings file from the bdshot versions I get always “Invalid channel option (154)” and can’t arm. It’s probably some parameter from 4.3.0dev back incompatible with 4.1.5 but I can’t find which one.
It’s a lot of tunning inside the parameters so I don’t want to throw it away and start from scratch.
Is there any way how I can locate incompatible parameter?
My parameterer file.
Thanks!

thanks for the tlog!
for the param error, just change the RCn_OPTION that is currently 154 to 41

@rptacek also, can you reproduce the watchdog? If it is reproducible we’d very much like to work with you to find the cause

Thanks, replaced channel option and can arm now, but even with 4.1.5 offical (no bdshot) I have only 2 motors working (same as in this my topic) so downgrade isn’t a way for me.
I supposed timer problem was only in -bdshot version, but obviously it’s in all versions.

It’s likely I can reproduce the watchdog crash in every flight, just need to know how to collect most important information. Let me know what version to test and what data to collect and I’ll try.
Just because every “reproducing” the problem means an uncontrolled crash, it’s not much joy flying the copter the way so when it crashes, it shouldn’t harm anyone, anything and if possible neither itself :frowning:.

can you reproduce on the ground? take off props and “fly” it on the ground
if you can reproduce without flying then we could work together to narrow down the issue

Yes, I can reproduce watchdog reset even on ground - I just did it.
I loaded actual 4.3.0dev -bdshot daily build from Apr. 16th and armed indoor without props - just shaked the copter manually.
After 48 seconds of “flying” it reset and during next boot made this message:


Here is video, log before reboot and log after reboot.
Now over Easter holidays I have a bit more time for playing, so it would be perfect to start diagnostic asap.
Roman

great!
I will put a series of firmwares here for you to test:
http://uav.tridgell.net/rptacek/
the first one is there now, and changes the omnibusf4pro to use a 16 bit timer. I’m doing that to see if this issue is a error in setting up a timer for a short timeout. We’ve had errors like that in the past that cause a similar watchdog. If this is the type of error then this firmware should instead of producing a watchdog it will instead get a 68ms delay. You could set SCHED_DEBUG=1 to see if one of those delays does get triggered.
Regardless of whether you get a watchdog on this firmware please post logs for me to take a look at.
Thanks for helping find this issue!

@rptacek has reported that the issue is fixed with the latest fw updates

@rptacek,

Just to close off this issue, my understanding is you’ve worked with Tridge and AndyPiper on this and we think this PR has fixed the issue. This fix is included in -rc2 which will be available for beta testing later today.

Yes, tridge provided me test build of patched firmware and I tested it on my setup. During 5 flights it didn’t show any abnormalities so we consider the patch solved the problem.
Thank you for the work of you all!

3 Likes