EKF Failsafe - f9p rtk corrections causes gps glitches

Using an F9P or M8P receiver?

Having a go at baud rate was going to be my next suggestion. Recommend 115200 as a starting point. While I trust that @ThePara has successfully used lower data rates, that may be without the need to carry MAVLink2 on the same connection.

I have not seen any EKF failsafes, GPS glitches, or other anomalous behavior when RTCM3 injection is lost. The system simply reverts to DGPS and carries on. I use a script to detect undesirable fix types and pause navigation on some of my Rovers where RTK precision is more critical.

I suspect there is some sort of data corruption happening that is causing wildly poor behavior at present, and we just need to nail down a root cause.

I tried a test setting my SiK telemetry radios to 230400 baud. No change in the GPS glitches.

Here’s a log file of the results:

An EKF failsafe caused by loss of GPS is exactly what brought me to all of this. Per my previously noted uploaded log:

It seems that prior to copter 3.3 there was a FS_GPS parameter that specifically handled loss of GPS. Since then it’s been rolled into the EKF failsafe (FS_EKF_ACTION) - which has three options - “LAND”, “ALT-HOLD” or “LAND even in stabilize,”

I didn’t realize it but my copter triggered an EKF failsafe and had gone into “LAND”. Knowing I had lost the GPS, I switched the mode on my transmitter from LOITER to “ALT-HOLD” which enabled me to guide the copter down from about 100 meters overhead.

So if a loss of telemetry causes this GPS glitch problem, there’s a risk that under RTK operation that loss of telemetry could cause a EKF failsafe.

Cornel Fudulu’s comment suggests very high telemetry baud rates should be required - but I don’t know if he’s using an M8P or F9P.

I’m happy to test other baud rates - would would you suggest?

And of course I’m happy to try any other tests. I have the feeling we’re making progress.

We seem to have narrowed things to the telemetry connection. I have used SIK radios very successfully in the past. I usually set them to 115200, but I have one that was working quite well for RTCM3 injection at 57600, so baud rate is not the likely culprit.

Have a look at your telemetry radio config. Here’s mine, for comparison:

I agree - a problem with telemetry carrying the RTK corrections is looking likely.

Here are my settings - I’ve only just increased the baud rate. Before it was 57600.

Do you recall reference you used for the other parameters?

I pretty much followed the guidance here:

https://ardupilot.org/copter/docs/common-configuring-a-telemetry-radio-using-mission-planner.html

Maybe the most glaring issue in my setting is that I don’t have ECC selected.

Your thoughts and suggestions for another trial?

I think enabling ECC may help (but see below from Tridge…as it does halve the data rate, which may really hurt!). I used the same guide.

Default transmit power is 20. I’m not sure if increasing it has any potential for inducing errors or data corruption.

Admittedly, I have not delved too deeply into SIK radio config - they usually just work!

It may be wise to update to the latest firmware from the vendor. The changelog indicates quite a few updates since 3.16.

1 Like

@jstroup1986 and @Yuri_Rage I’m a bit late to this discussion. I run holybro F9P on several vehicles, but I use NTRIP (Australia has a great free CORS service provided by Geoscience Australia, and there are two base stations within a few km of where I fly).
In the facebook post I suggested a power issue, and certainly F9P+RTK is quite power sensitive. That caught me by surprised when I was first testing, but I had a look at one of the logs above and POWR.Vcc looks fine, mostly at 5.2V or above, so while I wouldn’t rule out power, I think it is less likely.
The suggestion about telemetry radio issue for the RTCMv3 corrections is a good one. A couple of notes though:

  • I’ve noticed that the bandwidth needed for RTCMv3 generated by a local GPS tends to be significantly higher than what is needed from NTRIP, at least for my setup. I presume it is using a more efficient encoding of the corrections, but I haven’t looked into it in detail
  • enabling ECC in SiK halves the bandwidth available for a particular air data rate
  • you can see if the uplink is saturated with RAD logs, and also check if you are getting dropouts due to signal strength

This graph shows the TxBuf state on the ground radio. You can see it sits close to 100% full, which means you are likely not getting all the data through:


I suggest the following SiK settings:

  • air data rate 128
  • serial rate at both ends of 115200
  • do not use op resend on the air side (we don’t want it taking more downlink bandwidth)

I also suggest you lower the telemetry streaming rate in MissionPlanner to 3Hz or below on all streams.

2 Likes

@tridge and @Yuri_Rage

Today I reset the SiK settings per Andrew’s suggestions. One exception - Mission Planner did not have “128” as an option for “Air Speed” - so I used “125”. I’m assuming “Air Speed” is “Air Data Rate.”

Serial at both ends is set to 115200, and Op Resend is off on the “air” side.

I made a couple other changes. Yuri had Max Window of “131” . That’s not an available option so I set “200” which is halfway between the minimum and maximum values. (20-400)

I also increased my Number of Channels to 50 to match Yuri’s settings.

The GPS glitches still occur.

In addition, the HUD now gets an intermittent red warning: “Unhealthy GPS Signal”. One other observation - yesterday when Ntrip injection started the copter’s HDOP climbed from around 0.5 to up around 1.0 and above. Today, the HDOP never climbed above 0.6 when doing the Ntrip injection.

I’m wondering about the SERIAL3_BAUD setting. (GPS) The Holybro F9P docs say that the default baud rating of the receiver is 115200. But the MavLink messages on start up say that the GPS is connected at 230400. So I set SERIAL3_BAUD=230. That may be nothing but I thought it was odd.

One last thing - the reason it took me so long to get to this today is because I had to replace my RFD900x radio-modems. Yesterday I accidently set the baud on one to 1200. That effectively locked me out - as I could not reconnect again to reset it. I just happened to have two new RFD900x radio modems sitting on my shelf - so I just installed them. I believe they have newer firmware.

Thank you gentlemen - I appreciate you both sticking in there with me to debug this.

SERIALx baud rate for uBlox GPS modules with GPS_AUTO_CONFIG enabled is overridden. You should see a message on boot telling you that it’s detected at 230400 (I think we may have been typing at the same time - you confirmed this above).

EDIT: you just said you had a spare set of radios. Confirm the problem persists with the new radios?

Yes - problem is the same with new radios. Even with the new settings. They’re identical models to what I’ve been using. Thanks!

Also, while not helpful to the problem at hand, it’s important to note that the air data rate is independent of the wired serial rate. You could set differing baud rates on either end and still achieve successful data transmission so long as the wired connections were set to the respective, correct rates on the GCS and autopilot. Any one of these can create a bottleneck when bandwidth is exceeded.

RTCM3 injection should not exceed the available bandwidth for your configuration as it stands, so I’m a bit stymied for the moment. Hoping @tridge has some more ideas. The radios you’re using are typically known to be of high quality and well worth using.

Have you reduced the MAVLink message data rates as he suggested?

Good catch - I missed that one. And I’m not actually sure how to set Mission Planner to lower the telemetry streaming rate - to 3Hz or below - on all streams.

By copy to @tridge - can you point me in the right direction to make those reductions. (I tried a quick google to figure it out - but didn’t find anything that seemed appropriate.)

Many Thanks!

@Yuri_Rage @tridge

Maybe this is how to set the Mission Planner streaming rates. All mine are at “2” except Attitude, which is set to “4”.

Yes, that’s the correct section for global changes. You might also check the SR1_* or SR2_* parameters in the complete list for more granular control (I don’t recall which port you’re using for the radio, but I think it’s one of the two). ADS-B, in particular, could possibly be a bandwidth hog.

@tridge @Yuri_Rage

OK - I set all the Mission Planner telemetry rates to “1” - and still got the GPS glitches - but no red warnings about “Unhealthy GPS Signal”.

And ADSB is not active on my copter - I’m not even using an ADSB carrier board.

So then I went one step further and set just two items at telemetry rates of “1”.

Testing that, I still got GPS glitches. Perhaps a bit less severe - bot got them way to often for any level of comfort.

I’m sorry our tests today weren’t more fruitful. Thank for hanging in there with me!

One thing that occurred to me - I’m using YAAPU passthrough over CRSF - I get YAAPU telemetry on my transmitter. I know this doesn’t have anything to do with the SiK radios - but it’s just another piece of the telemetry puzzle.

Passthrough telemetry shouldn’t affect RTCM3 injects. I use it on most of my vehicles with zero detriment (all H7 processors, but varying exact hardware).

What autopilot/carrier board are you using? Perhaps there’s an impact there…?

I’ll submit that ANY GPS glitches that seemingly occur due to some particular configuration (as opposed to actual/external GPS anomalies) are 100% unacceptable.

BTW Andrew - the graph you have above showing RAD.TxBuf - that looks like a MavExplorer graph - but I don’t seem to have it on my copy of MavExplorer. Can you please tell me where you got it?

Tridge wrote most of (if not all of) the code for MAVExplorer, so I’ll defer to him for graphing there. However, here is the same graph displayed via Mission Planner’s log viewer for your most recent log:

Interestingly, I connected a little testbed Rover that uses mRo radios with SIK firmware tonight, using NTRIP RTCM3 via Mission Planner MAVLink injection and graphed the same parameter with this result:

It shows a completely saturated transmit buffer (at 115200 on both ends), yet I have had no issues at all with it during any phase of navigation. It happily zips around my yard at a surprisingly high rate of speed with RTK precision.

I can try the same test on my largest Copter (with matched firmware to yours) at a later time. It runs a Cube Orange with SIK radios whose settings I showed earlier in the thread and an M8P GPS. I do not think the exact GPS model is to blame here, so such a test should be valid (or at least a reasonable data point).

My preferred autopilot is the Cube Orange, and I have always used the awkwardly configured ADS-B carrier board simply due to availability (I don’t need ADS-B on any of my builds, even the airborne models, though I do enable it on anything non-Rover). I have achieved similar success with the entire range of Matek H743 boards without issue, and I have even gotten reliable RTK Fixed performance from an extremely cheap Pixhawk knock-off with GPS hardware far beyond its class.

This particular issue remains a mystery to me, and I am eager to find the solution!

@Yuri_Rage @tridge

OK - Thanks to Yuri’s pointing out that I can graph RAD.TxBuf in Mission Planner, I dug into the history of this on both this copter testing RTK (Hexsoon EDU450) and my other copter not involved in these tests. (Hexsoon TD650)

I copied the complete contents of the log directories from both copters SD cards so I could easily compare their log files. I’ve put them up on Dropbox here:

Both copters are very similar. Both use TBS Blacksheep Crossfire RC radios running Yaapu. Both have RFD900x telemetry radios. The EDU450 (the RTK subject) uses a CubePilot mini carrier board, the TD650 uses a CubePilot ADSB standard carrier board. The EDU450 uses the little “Power Brick Mini” to sense current and power the Cube - the TD650 uses a MAUCH current sensor and BEC to power the Cube.

The EDU450 has a FPV camera and VTX - installed about a year ago - but neither the camera nor the VTX touch the autopilot at all.

Looking back through these logs, the TD650 does not experience the TxBuf saturation that we’re seeing on the RTK test on the EDU450. Here’s a typical graph from the last TD650 flight - a map survey flight: (It’s the last TD650 log file in the uploaded directory.)

Going through the logs on the EDU450 - they all show high TxBuf saturation - going back to the beginning. I installed a new RDF900x on the EDU450 yesterday - it did not change the TxBuf saturation.

So while we’re only assuming at this point that the TxBuf saturation on the EDU450 is causing the GPS glitches once Ntrip injection begins, there is something obviously different in these two copters.

I’m a bit curious if maybe there’s something different in the Cubes themselves - or perhaps how they are powered differently.

When I fly the TD650 I frequently have QGC as a telemetry base - I just use it for situational awareness and redundancy. But I don’t always use the base - as I have Yaapu that provides telemetry to my RC control radio. Fewer of the EDU450 flights are done with the connection to QGC as a telemetry base - must most of the longer flights do use it.

One last note - on a few of the logs there is not RAD selection on the Mission Planner log graphs. I’m not sure why this would be the case.

il post this here from your facebook post

TX buff is on the telemetry radio not the flight controller, it tells the flight controller the status of the radios transmission buffer so it can dynamically control the packet rate in order not to overload the radio link and start dropping packets.

This is how you can have a serial rate much higher than the transmission rate without it dropping packets since the radio tells the flight controller how much it can handle in real time.

this stat is only available on some specific radios that are mavlink compatible since they need to interact with the flight controller, the most common are the old 3dr radios and the rfd900.

radios that provide a transparent serial link dont need this as serial speed = link speed.

its totally normal for the buffer to be 80-90% full since it would just speed up the packet rate if it start to drop to keep it there.

check your radio settings are identical, copy the settings from the working copter to the one with issues. its possible its running so slow that its having to slow the link to the point the gps injection stops working.

you need to check the status while its working look for rssi, noise remnoise, remrssi, to see if there are any unusual numbers by comparing between the pair of machines and pairs of radios. you could have a radio with a noisy regulator or something jamming the radio causing it to slow down. I used to see it with original 3dr radios that had a usb chip that would jam 433 radios.
the txbuf message is sent as a mavlink radio status packet.