Continuing Radio Failsafes with perfect RSSI/LQ

I upgraded to Copter 4.3.5 hoping that would fix my radio failsafe issue. It has not.

I had a random very short failsafe again today and have been scouring the log to see any potential issue.

If anyone could help me track this down I would REALLY appreciate it.

Failsafe Log .bin

Radiomaster RP3 Diversity RX
HappyModel 2400 Pro TX
Matek H743-Slim

I have had the same issue on BetaFPV TX module and also Matek R24D RX. I do not have issues with other quads flying much further and with obstacles.

Thanks!

Bryan

Did you try the ELRS 420k baud rate option in RC_OPTIONS?

Hi Andy,

I did, I have had that option checked for last few tests.

Is there any additional logging I can do to help diagnose?

Hey Andy, sorry to bother you, but is there anything else you can think of on this or anything that stands out from my log? Im about at my wits end with this thing :confused:

I just cannot understand why it had worked fine for multiple flights and then this, especially with LQ and RSSI showing no issue. Voltage is fine as my camera is displaying the 5v bus voltage on its own and nothing else is glitching.

Edit: apparently I had not replied directly before so you likely didnt see it, but I have been using the 420k RC_OPTION.

The problem is that some of the ELRS frame rates are very fast and its easy for the flight controller to get lost in the stream if its not reading the UART very accurately. This is one reason why using DMA is usually so essential. The baudrate mismatch made this worse. With no way of reproducing this its quite hard to know what to do - I could reproduce the baudrate issue, but not anything else. To make matters worse some of the ELRS RX clocks drift (Matek was particularly bad) meaning that the RX and TX could get mismatched over time.

What I suggest you do is reduce your ELRS frame rate and see of that helps

Be sure as well that you haven’t accidentally left the TX power on the transmitter at some very low value. I was very frustrated with unexpected radio failsafes like these after having made that mistake recently.

Thanks for the reply Andy!

I was using 250hz and switched down to 100hz full. I was only able to perform a 6 minute flight test as we had some heavy winds come in. During that time I did not have any failsafes- this is promising, my failsafes haven’t met a specific timing but my gut says I should have gotten one during that time.

I have some questions on this:

  1. Is there a reason a board with AP would not support higher rates? I had been running 500 on my other quadcopters with no issues (and more distance than this one when failsafe).

  2. I feel like I had been advised that the DMA issue was not important. I had tried changing UARTs to one with DMA but was still getting failsafe and so switched back to previous.

  3. Regarding clocks drift, I had been originally using a Matek but last few tests on a Radiomaster. I can understand the drift issues however wouldn’t this likely cause LQ and RSSI drops?

  4. An interesting note, I reviewed my AP logs and when I was on 250hz, the message said rate was 500hz. When I went down to 100hz, the message said rate was 150hz. My SD logs on TX show the proper mode. Is there a chance AP is not interpreting the rate correctly? Is there any sort of debugging log that I can enable for the serial data?

Thanks!

Bryan

Hey Yuri,

I am running at 100mw which is what I’ve been on since last year (even with longer range flights). I wouldn’t think this would be an issue, especially when LQ and RSSI are solid? Plus I am testing at most 40ft from myself via LOS.

And a quick aside, I recognize your name from the HBT forums. Funny how people have similar hobbies. I feel like brewing was far easier than this hobby.

Cheers!

Bryan

1 Like

I’m just suggesting things that might help. It should all work, but I have seen weirdness and without being able to reproduce I can’t say exactly what is going on.

Thanks Andy, sorry for the barrage of questions there. I’m just trying to get a deeper understanding of it all. I very much appreciate the help! I will hopefully get another test in soon so confirm the lower rate corrected the issue.

Side note: I’m an ex-IT guy so I’m really hoping I can find some way to log the issue further. The displayed rate vs set rate is very concerning, but I have no idea how to further diagnose that. I can see that being an issue though, if AP/FC thinks it should be getting 500 and I’m sending 250, while the expecting 150 and getting 100 may be closer and not triggering a failsafe. Maybe I will pop into the Discord to see if there are some deeper technical ways to track this. I would love if my testing could help correct issues for others in the future.

Cheers!

Bryan

The rate is actually calculated based on the throughput, so should be actual rate rather than theoretical rate.

I can probably send you a debug build if this is easy to reproduce, we could at least then figure out what was triggering the failsafe

Oh I see, so if it says 500 it’s because it is in fact getting 500, not that it thinks it should be getting 500?

This could be really useful- when I was on 250 it happened at least once per flight, sometimes twice. I suppose if it continues to work on 150 that would be fine, however it would still be nice to know where the issue is.

While I think it’d be nice to enable the super fast update rates afforded by maxing all of the ELRS params, it’s my experience that most ArduPilot applications don’t really benefit from exceedingly low latency/fast rates. I’m happy to throttle the rates to gain range and reliability. I think a good methodology includes starting with the lowest message rates and only increasing them when you can prove that as the limiting factor in control/performance.

Yeah I definitely understand the higher rates aren’t really needed in this case; I just really hate the “unknown”. It would be really nice to know WHY this is happening and also be able to maybe trigger warnings for users that haven’t gone through the whole experiment-until-no-failsafe process. I figure if we can track down the “why” it would be easier to write into documentation (i.e. Use X for starting rate in this environment, can test higher rates at your own risk).

I mean, I know and fully appreciate this is all free and open source, but with the level of mostly solid “out of the box” configuration, these random/unknown ExpressLRS issues kind of throw a wrench into the expected base stability. This may well be more of an ExpressLRS issue though, but that goes back to being able to log and test and either get them to fix it or provide workarounds (kind of like all the bad SmartAudio implementations lol).

Quick related but not on topic note: it would be REALLY nice if ExpressLRS would provide per-model settings since I may use 250-500 at 100mw for my freestyle rigs and 100 or less
at 250mw+ for my long range. I have no idea why they are so resistant but apparently they don’t want to change it which is unfortunate.