Reducing telemetry latency, telemetry scheduling?

bobzwik · April 10, 2023, 7:31pm

Hello,
I’m working on landing a drone on a vehicle at high speeds. The vehicle is equipped with a Cube Orange and F9P GPS, sending it’s position directly to the drone, and the drone is following the vehicle using a modified Follow mode, and then lands using a modified Land/Follow mode.

To be able to achieve a landing at the exact spot on the vehicle, I need to account for a certain latency/delay. Without taking this into account, the drone will land progressively behind target as I attempt to land at progressively higher velocities. I’ve attempted to measure this latency by comparing the logs of both the vehicle’s Cube and the drone’s Cube. Using GPS time, I can see when the vehicle has calculated it’s position (POS structure of its log) and when the drone uses that very same position in the FOLL structure of its log. This difference in time has been constant every time I checked (over 20 times over the span of 3 months), with a value 160 ms. If I compensate for 160 ms delay in our Follow mode (target_position = received_vehicle_position + vehicle_velocity*delay), our drone lands precisely on target.

While this patch works, it only works when the vehicle is travelling at a constant speed. Following performance degrades in turns and when the vehicle accelerates or decelerates. I’ve been attempting to reduce this delay to reduce the effect of turns/accelerations, but I haven’t had much success.

Our initial trials were with RFD900 for communication between the vehicle and drone. Since then, I’ve tried:

RFD900 in low-latency mode and minimum “max-window”, but still 160 ms delay,
ESP32 running a Mavlink compatible ESP-Now script (original code by @Yuri_Rage), but still 160 ms delay,
same ESP32 script, but reducing the number of Mavlink messages to only those necessary, but still 160 ms delay
all previous trials were at a serial baudrate of 57600. By increasing the baudrate on the vehicle and drone to 921600, I was able to reduce the delay to 135 ms

If I did my math correctly, the reduced latency achieved by transmitting bits 16x more rapidly indicated that there is an Ardupilot scheduling delay of 133 ms while the leftover delay is associated to bit writing and reading. What I mean by scheduling delay, is that the vehicle’s Cube transmits its position 133 ms after it calculates it.

Question: What would I need to do to schedule the transmission of the GLOBAL_POSITION_INT message right after the position estimation?

I’m aware than I am technically measuring the logging latency by comparing logs, but if we consider similar logging delays on both Cubes, then we can consider the measured latency to be telemetry latency. Also, compensating using the measured latency allows for precise landings.

amilcarlucas · April 11, 2023, 11:09am

Tridge had a patch to automatically account and correct GPS data. But that patch never got merged, you could try to revive it.

bobzwik · April 11, 2023, 12:34pm

Thanks for your input! However, I’m not sure I quite follow. Are you saying that the message GLOBAL_POSITION_INT is constructed from the delayed GPS data? It was my understanding that GLOBAL_POSITION_INT was formed using EKF positional data (local position in meters), transformed into GPS coordinates (lat, lon, alt), like the POS log structure. And this EKF positional data would not be delayed like the raw GPS position.

Or am I wrong in assuming that?

amilcarlucas · April 11, 2023, 1:18pm

One problem at a time please:

Problem 1 - (my point) Having a better EKF solution. edit 2023-04-14: this has already been solved with Adjust GPS timestamps using UART latency estimate by tridge · Pull Request #8409 · ArduPilot/ardupilot · GitHub

Problem 2 - (your point) using the EKF position ASAP.

My advice is to first solve the first issue, and then go for the second.

bobzwik · April 11, 2023, 2:36pm

Sorry, I’m not fully understand your line of thinking.

Are you talking about taking into account the GPS delay in the EKF? I thought this was already the case with parameter GPS_DELAY_MS. I have this parameter set to 0, to use the default driver-specified delays. In AP_NavEKF3_Measurements.cpp, a delay is obtained from (I assume) AP_GPS.cpp, and if _delay_ms is not specified by the user, it uses the driver-specified lag. I’m using a Ublox F9P on UART, so I’m assuming the delay used is 120 ms (from here, F9 is assumed to be likely as good as M8). And since I’m using Moving Baseline on my vehicle’s Cube (using dual F9P GPS for heading, as the compasses are affected by the vehicle’s steel), it seems an extra 40 ms of lag is considered.

So a total GPS delay of 160 ms should theoretically be considered in the EKF3. By comparing the log structure POS and GPS, the GPS latitude, longitude and altitude are indeed late by about 160 ms compared to the POS (EKF3) lat, long and alt. A final validation to see if a 160 ms delay is really considered would be to enable LOG_REPLAY and view the RGPI log structure which contains the GPS delay used by the EKF.

Please correct me if I’m wrong with my analysis, or if I am misunderstanding you.

bobzwik · April 11, 2023, 2:47pm

Or are you saying that perhaps I’m using a wrong gps delay?

Maybe since I’m also using RTK from a base station (and the RTCM messages are sent from the GCS to the drone, then from the drone to the vehicle) the GPS delay is in fact greater than 160 ms, due to delays in receiving the RTCM messages and to extra compute time? If I specified a GPS_DELAY_MS of 290 ms (driver-specified delay + my 130 ms “latency” delay), maybe that would solve my latency issues? But 290 ms seems absurdly high, no?

amilcarlucas · April 11, 2023, 3:41pm

The GPS delay is a fixed parameter, and it depends on the baudrate, protocol and GNSS receiver FW used.
The stuff tridge did estimates these delays and corrects them. https://github.com/ArduPilot/ardupilot/pull/8409

No I also do not think the RTCM delays have any relevance here as long as they are smaller than 2 seconds.

bobzwik · April 12, 2023, 1:20am

You’ll have to excuse me, but I am really not sure what you are suggesting.

My advice is to first solve the first issue, and then go for the second.

I don’t know what you mean by my first and second issue. In case I badly expressed myself, my issue is that the drone is using a delayed target position. The following figure is an example of that issue. At a certain Time of Week reported by the drone’s GPS and the target GPS, the drone (FOLL) is using a past position of the target (POS). The offset between both lines are 160 ms, but I managed to reduce this offset to 135 ms by increasing the serial baudrate on both the target and the drone.

I know that using the logs to measure this latency isn’t the ideal way to go about it, since logs have a lower priority in the scheduler. But if I compensate the Follow mode with this approximate delay, the drone lands perfectly on target (instead of 2 meters behind target at 60 km/h).

My issue is I’m trying to find the source of this latency, and find out how to reduce it. I’ve already figured out that part of the delay is due to bit writing/reading, which is reduced by increasing the baudrate. However, I still have to account for 130 ms of latency, and I do not know where that comes from.

My initial hypothesis is that telemetry has a low priority in the scheduler, causing this delay. I’m most likely wrong (I do not fully understand the structure of the scheduler and when GLOBAL_POSITION_INT is sent), which is why I really appreciate your (and others) opinion about this.

I unfortunately don’t understand what you’ve been trying to suggest to me. Would you mind explaining it a little bit further? I would really appreciate it! You were saying that I’m trying to fix 2 issues at once, which 2 issues?

rfriedman · April 12, 2023, 4:17am

I’m not sure on the timeline for your project, but the DDS interface is designed to be as low latency as possible to support control commands. These control commands will be implemented soon.

Do you happen to have a companion computer able to run ROS 2 on your aircraft?

amilcarlucas · April 14, 2023, 10:36am

I edited my post above for clarity. I noticed just now that problem 1 has already been solved in Adjust GPS timestamps using UART latency estimate by tridge · Pull Request #8409 · ArduPilot/ardupilot · GitHub you just need to aggressively fly above 6m/s and analyze the logs to get a very accurate GPS lag reading.

So, I now agree with you, the telemetry scheduling should be improved to reduce the latency of those GLOBAL_POSITION_INT messages.

bobzwik · April 14, 2023, 1:32pm

@rfriedman Sorry for the delayed response (no pun intended)! I had the time to check out your work with integrating DDS into Ardupilot, watched your Ardupilot Conference video, and what you’re building is quite exciting! It most likely would greatly reduce the latency. We currently do not have companion computers on our drone or vehicle, our control is simple enough to be implemented in Ardupilot. My colleague and I also haven’t yet developed with ROS (we are mech eng grads) but I’ve worked on other projects with other colleagues where they connected Ardupilot to a ROS computer. However no one in our lab has started using ROS2.

Our trials are planned for mid-end of summer, but it will be unlikely that we’ll be able to spend the hours learning enough of ROS2 and DDS to implement this (but it not impossible). I will be certainly keep an eye on your progress, and will recommend the switch to ROS2 to my colleagues when comes the time to use Ardupilot + ROS.

If we do get the time to work on this, would the simplest setup look like this:

Companion computer on vehicle running ROS2,
Vehicle EKF3 publishing pos/vel/acc/heading states as soon as they are calculated,
XRCE-DDS/Micro-ROS via physical UART connection to vehicle Pixhawk to retrieve pos/vel/acc/heading states ASAP,
XRCE-DDS/Micro-ROS via ESP32 UART-bridge connection to drone Pixhawk to provide pos/vel/acc/heading states ASAP to drone’s Follow Mode,
Drone’s Follow Mode subscribing to vehicle’s pos/vel/acc/heading states.

The companion computer would be on the vehicle, as to not add weight to the drone. A Raspberry Pi would suffice I imagine?

bobzwik · April 14, 2023, 1:51pm

@amilcarlucas Ahhh thanks! I understand now!

Have you ever played with the scheduler? How would you go about this? I could see 3 possible implementation:

Increasing the priority number of GCS - update_send on the vehicle and GCS - update_receive on the drone
Creating a new task to only send GLOBAL_POSITION_INT with higher priority
Directly calling telemetry functions from the EKF3 to send GLOBAL_POSITION_INT after states are calculated.

I’m unsure about how increasing the priority of low-priority tasks will affect Ardupilot’s behavior, and if calling telemetry functions directly from the EKF3 could break the EKF3. Would you know someone who has worked with the scheduler? Thanks!

rfriedman · April 14, 2023, 5:20pm

Got it. With our GSoC project proposals, we expect to be doing some sort of control. It could be largely copy/paste.

Yes, your approach to run the companion computer on the vehicle sounds fine as long as the wireless link meets your bandwidth/latency requirements. With MicroXRCE DDS’s support for both reliable and unreliable transfer over multiple streams, it should provide you the ability to tune it to suit your application needs.

Yep, Raspberry Pi should work. I’ll probably be starting my companion computer work with a Pi 4 soon, stay tuned.

bobzwik · February 7, 2024, 6:45pm

I’ll just give an update about my latency reductions.
So in the end, we stayed with Mavlink.

By using different telemetry links (ESP32 vs RFD900), we increased the UART baudrate on both the vehicle Pixhawk and the drone Pixhawk (from 57600 to 921600 baud), reducing the latency from 160 ms to 135 ms.

By increasing the priority of the GCS - update_send task in the scheduler to priority #2, we further reduced the telemetry to 25 ms. We were pretty please with that!

@rfriedman for a future project, I’m interested in using micro-DDS. DDS currently works through the serial ports, but if I’ve read correctly, you are developing DDS through an Ethernet link on Ethernet-capable FCs, right? In your opinion, would DDS over ethernet have a faster latency than over Serial? Have you measured the latency for DDS over Serial? I could Imagine it being faster than the 25 ms obtained with Mavlink. If you think DDS over ethernet has potential, then I’ll probably be purchasing and Pixhawk 6X soon.

amilcarlucas · February 7, 2024, 7:02pm

Nice to hear that, we are about to start testing ESP32 at 460800.
Maybe will test scheduler to priority #2 as well.

bobzwik · February 7, 2024, 8:36pm

Just to let you know, increasing the priority of the Mavlink tasks did not create long loops in our case. Everything ran smoothly at 400 Hz.

Our initial method of measuring the latency stopped working when we changed the the priority though. I explained it further up here, but to summarize:

The target Pixhawk was on a vehicle and sending its position to the drone Pixhawk (who could be either on the vehicle or on the ground, as long as it can receive the position data).
I would then take both logs from the drone and the target, and display the target’s position logged on the target and the target’s position logged on the drone (using the FOLL logs created by the Follow mode), as a function of GPS (which should be synchronized between both Pixhawks).
I can either measure the latency as the gap between both curves, or I could run a cross-correlation algorithm to find the shift between both curves.

This method stopped working when increasing the priority of the send_update task on the target, maybe because there would be a latency between sending the position and logging the position?

My new method for measuring latency is:

Modify the Ardupilot code to log the “distance to target”.
Place both Pixhawks on the vehicle, with the target Pixhawk (sender) 1 meter in front of the drone pixhawk (receiver).
Drive at different speeds.
The logged “distance to target” should be 1 m at rest, but will decrease as a function of the velocity. This increase in “distance to target” can then be used to calculate the latency.

This method doesn’t require the need for both target and drone logs and having them synced with GPS time. You also technically don’t need to modify the Ardupilot code to log the “distance to target”, you can post-process that afterwards.

rfriedman · February 9, 2024, 4:12am

Yes, it actually should work on serial and ethernet, we just have a bug in the eProsima library in their custom transport implementation, but no one has time or knowledge to chase it down. I’m hoping we can resolve it by the time 4.5 is out.

With the priority updated on the update_send increased, I would not expect much difference to DDS performance as long as you are pushing messages at a high enough rate.

How are you measuring latency between the two devices?

bobzwik · February 10, 2024, 7:50pm

Nice! I’m looking forward to find the time to play around with DDS!

To measure the latency, we:

Logged the “Distance to target” (N-E-D) in the FOLL log structure. I don’t remember if we modified the code to log these 3 values or if they are originally logged.
Placed 2 Pixhawks on the roof of a car, both using RTK from a stationary base station. The target Pixhawk is placed 1 m in from of the follower Pixhawk.
Enabled Follow Mode and started driving, as fast as possible, and reached a constant velocity.
Studied the logs. At rest, the distance to target obtained by DN and DE should be 1 m (in front of the follow). But at a high constant velocity, it will be less than 1 m, or perhaps the follower will think it is in front of the target. With the logged speed and position offset, we calculated the delay.

Perhaps an easier way would be to transmit the GMS (GPS time in ms since the start of the week) from one Pixhawk to the other, and have the receiving Pixhawk log the difference between it’s GMS and the received GMS.I think there’s a couple Mavlink messages that send that value. But since many GPS update at only 5 or 10 Hz, I don’t know how that will affect the GMS substraction.

bobzwik · February 10, 2024, 8:34pm

I also made sure to disable the REQUEST_DATA_STREAM from Mission Planner and specified the rates for specific messages only for each telemetry port on both Pixhawk (as shown here. Only the strict minimum of messages our application required, to (hopefully) minimise latency.

Georacer · February 12, 2024, 1:02am

Hi @bobzwik ! Thank you very much for this tread!

We’ve had our fair share of issues trying to land a Copter on a moving ship. Part of these woes I hope are addressed with this PR or its twin.

However, I didn’t expect the transport latency to be such a big issue, as by reading the code I was expecting this line to take care of it. Did you know of it? Do you have reasons to believe it’s not working as it should?

My suspicion (at least for our setup) is that the culprit for most of the inaccuracies are the dissimilar GNSS setups on target and follower. They are two different brands and models of GNSS receivers. What we haven’t done, and I think it would help, would be to set the target as the roving base-station for a relative RTK precision solution on the follower.

Can you talk a bit more about your GNSS setup?

BTW, I loved your 2nd experimental method to measure the latency!