We have noticed a recent issue with our copters and I was wondering if anybody else has seen it. We have copters with an RTK gps (like Here2+) as well as a mRo Location One (flashed with AP_Periph 1.1.0). We noticed that if we leave our copter powered on and receiving RTCM corrections for long periods of time (around 120 minutes), we consistently find that the Location One disappears from ArduCopter. When we inspect the Location One with a zubax babel, we can see that it is in Maintenance mode. If we used the babel to reboot it, it will go into operational mode again and function in ArduCopter again.
In our troubleshooting, we have tried multiple aircraft and multiple combinations of GPS units. It turns out that we can keep the problem from happening all together if we set the GPS_INJECT_TO param to exclude the location one.
Weāre using Mission planner to survey in and send corrections, so Iām not sure if itās a bit of bad data that we donāt see very frequently, or the accumulation of data over a long period of time, or if the error is within AP_Periph, or the Location One in particular, or Mission Planner even.
Those notes are a bit messy but there is a summary at the end with the useful information. I want to add that we have observed this on two different aircraft. In each case we mitigated the problem by using the GPS_Inject_To param to prevent the corrections from being sent to that unit.
We have now observed a similar problem on a ZedF9 that is connected to the flight controller through an mRo KitCAN M10025B. The GPS position, compass data, and baro data all stopped being sent. It looks like the node crashed, but we are not yet sure. I will test more with this node tomorrow to see if it is the same issue and if we can narrow it down any further.
thanks, that is very helpful.
There are actually 2 issues we need to track down:
why does the node reboot into maintainence mode
why doesnāt it immediately come out of maintainence mode and reset itself
The AP_Periph design is supposed to be robust against these types of failures. If it ever does reset and it had been running for more than 30s before the reset then it should bypass the bootloader wait and go straight to operating again. I need to work out why that isnāt happening. The 30s check is to handle the case of a bad fw load, and you want to wait in the bootloader to get a fixed fw loaded.
The good news is your debug images in that doc tell me the exact firmware version the GPS is running (it is c2dce806). That means I can setup a node here to try and reproduce the issue.
Iāve now built and loaded the exact firmware you were running in those debug logs, and Iāll see if I can reproduce. I suspect the bug is one Iāve already fixed in the UART driver, but I wonāt know for sure unless I can reproduce and then show it doesnāt happen with a newer fw.
Iām running the next now. Hopefully my node will lockup in the next couple of hours.
Cheers, Tridge
well, if you could test this fw that would be very helpful: https://firmware.ardupilot.org/AP_Periph/latest/f303-M10070/
there is a good chance the bug is already fixed.
Iāve setup my test to send the RTCMv3 data at 6x the normal rate in the hope I can reproduce the issue more quickly, but if you could test the latest fw from the above link in parallel that would be great.
The problem certainly is difficult to replicate. Thank you for looking into it. And sure, I will test the new firmware ASAP. Unfortunately I have already broken down the RTK base and test rigs for the day here. I should have results Friday or Monday.
The second problem that we observed with the zedF9 and the kitcan module often occurred within 30 seconds of sending rtcm messages. If you have that hardware available, it may be worth attempting that setup. However, we have not yet inspected the uavcan messages during that failure so it may not be the same.
Sending data faster seems like it might help. We were using a Here+ as the base for the first configuration and an F9P for the second configuration (which sends significantly more rtcm data).
Iāve put a fixed firmware here to test: http://uav.tridgell.net/M10070/
load the AP_Periph.bin with MissionPlanner CAN UI or the UAVCAN GUI tool.
Cheers, Tridge
That looks like an insidious bug. Thank you for tracking it down! I will test the fix ASAP on Monday. Iāll do a bit more digging on the zedF9 problem as well to see if it is the same. Thank you again
This fix has now been merged into master. You should now use this firmware: https://firmware.ardupilot.org/AP_Periph/latest/f303-M10070/
I will be doing a new stable release of AP_Periph soon, once weāve done some more testing. Test reports welcome!
@tridge We did some more testing with the zedF9 running through the mRo kitcan adapter. With only some rtcm corrections being sent the unit ran fine for a while. I then used u-center to add the messages for more constellations. Less than two minutes later the unit got stuck in maintenance mode. Of course this was right after the dev call.
The can node was running the latest version of the f303-M10070 from the build server. I put the commit version below.
Branch: commit 54bae68e02d4db76406869e55f3ecc494724341c
Let me know what I can do to gather more info and to help.
RTCM Messages Sent Before Crash
1005
1077
1087
1097
1127
1230
RTCM Messages Added Right Before Crash
1074
1084
1124
thanks. Are you on discord? Iād like to get your help in reproducing this. Screen share on discord would be good. See ArduPilot if you are not familiar.
Iād also like to see the debug output from the node when this happens. Do the following:
install latest master on the flight controller
set CAN_LOGLEVEL=1 on the flight controller
on the kitcan adapter, set the DEBUG parameter to 1
then inject RTCM to reproduce the issue. We should end up with some messages in the log on the flight controller (and in the messages tab in MissionPlanner) giving information about stack usage on the CAN node.
Sure. Iāll send you my discord handle. I am going to get everything setup here (RTK base, master flashed). Should be ready within one hour if that works for you. If not Iāll gladly schedule to talk at another time.