Reducing telemetry latency, telemetry scheduling?

@rfriedman Sorry for the delayed response (no pun intended)! I had the time to check out your work with integrating DDS into Ardupilot, watched your Ardupilot Conference video, and what you’re building is quite exciting! It most likely would greatly reduce the latency. We currently do not have companion computers on our drone or vehicle, our control is simple enough to be implemented in Ardupilot. My colleague and I also haven’t yet developed with ROS (we are mech eng grads) but I’ve worked on other projects with other colleagues where they connected Ardupilot to a ROS computer. However no one in our lab has started using ROS2.

Our trials are planned for mid-end of summer, but it will be unlikely that we’ll be able to spend the hours learning enough of ROS2 and DDS to implement this (but it not impossible). I will be certainly keep an eye on your progress, and will recommend the switch to ROS2 to my colleagues when comes the time to use Ardupilot + ROS.

If we do get the time to work on this, would the simplest setup look like this:

  • Companion computer on vehicle running ROS2,
  • Vehicle EKF3 publishing pos/vel/acc/heading states as soon as they are calculated,
  • XRCE-DDS/Micro-ROS via physical UART connection to vehicle Pixhawk to retrieve pos/vel/acc/heading states ASAP,
  • XRCE-DDS/Micro-ROS via ESP32 UART-bridge connection to drone Pixhawk to provide pos/vel/acc/heading states ASAP to drone’s Follow Mode,
  • Drone’s Follow Mode subscribing to vehicle’s pos/vel/acc/heading states.

The companion computer would be on the vehicle, as to not add weight to the drone. A Raspberry Pi would suffice I imagine?

@amilcarlucas Ahhh thanks! I understand now!

Have you ever played with the scheduler? How would you go about this? I could see 3 possible implementation:

  • Increasing the priority number of GCS - update_send on the vehicle and GCS - update_receive on the drone
  • Creating a new task to only send GLOBAL_POSITION_INT with higher priority
  • Directly calling telemetry functions from the EKF3 to send GLOBAL_POSITION_INT after states are calculated.

I’m unsure about how increasing the priority of low-priority tasks will affect Ardupilot’s behavior, and if calling telemetry functions directly from the EKF3 could break the EKF3. Would you know someone who has worked with the scheduler? Thanks!

Got it. With our GSoC project proposals, we expect to be doing some sort of control. It could be largely copy/paste.

Yes, your approach to run the companion computer on the vehicle sounds fine as long as the wireless link meets your bandwidth/latency requirements. With MicroXRCE DDS’s support for both reliable and unreliable transfer over multiple streams, it should provide you the ability to tune it to suit your application needs.

Yep, Raspberry Pi should work. I’ll probably be starting my companion computer work with a Pi 4 soon, stay tuned.

1 Like

I’ll just give an update about my latency reductions.
So in the end, we stayed with Mavlink.

By using different telemetry links (ESP32 vs RFD900), we increased the UART baudrate on both the vehicle Pixhawk and the drone Pixhawk (from 57600 to 921600 baud), reducing the latency from 160 ms to 135 ms.

By increasing the priority of the GCS - update_send task in the scheduler to priority #2, we further reduced the telemetry to 25 ms. We were pretty please with that!

@rfriedman for a future project, I’m interested in using micro-DDS. DDS currently works through the serial ports, but if I’ve read correctly, you are developing DDS through an Ethernet link on Ethernet-capable FCs, right? In your opinion, would DDS over ethernet have a faster latency than over Serial? Have you measured the latency for DDS over Serial? I could Imagine it being faster than the 25 ms obtained with Mavlink. If you think DDS over ethernet has potential, then I’ll probably be purchasing and Pixhawk 6X soon.

Nice to hear that, we are about to start testing ESP32 at 460800.
Maybe will test scheduler to priority #2 as well.

Just to let you know, increasing the priority of the Mavlink tasks did not create long loops in our case. Everything ran smoothly at 400 Hz.

Our initial method of measuring the latency stopped working when we changed the the priority though. I explained it further up here, but to summarize:

  • The target Pixhawk was on a vehicle and sending its position to the drone Pixhawk (who could be either on the vehicle or on the ground, as long as it can receive the position data).
  • I would then take both logs from the drone and the target, and display the target’s position logged on the target and the target’s position logged on the drone (using the FOLL logs created by the Follow mode), as a function of GPS (which should be synchronized between both Pixhawks).
  • I can either measure the latency as the gap between both curves, or I could run a cross-correlation algorithm to find the shift between both curves.

This method stopped working when increasing the priority of the send_update task on the target, maybe because there would be a latency between sending the position and logging the position?

My new method for measuring latency is:

  • Modify the Ardupilot code to log the “distance to target”.
  • Place both Pixhawks on the vehicle, with the target Pixhawk (sender) 1 meter in front of the drone pixhawk (receiver).
  • Drive at different speeds.
  • The logged “distance to target” should be 1 m at rest, but will decrease as a function of the velocity. This increase in “distance to target” can then be used to calculate the latency.

This method doesn’t require the need for both target and drone logs and having them synced with GPS time. You also technically don’t need to modify the Ardupilot code to log the “distance to target”, you can post-process that afterwards.

1 Like

Yes, it actually should work on serial and ethernet, we just have a bug in the eProsima library in their custom transport implementation, but no one has time or knowledge to chase it down. I’m hoping we can resolve it by the time 4.5 is out.

With the priority updated on the update_send increased, I would not expect much difference to DDS performance as long as you are pushing messages at a high enough rate.

How are you measuring latency between the two devices?

1 Like

Nice! I’m looking forward to find the time to play around with DDS! :slight_smile:

To measure the latency, we:

  • Logged the “Distance to target” (N-E-D) in the FOLL log structure. I don’t remember if we modified the code to log these 3 values or if they are originally logged.
  • Placed 2 Pixhawks on the roof of a car, both using RTK from a stationary base station. The target Pixhawk is placed 1 m in from of the follower Pixhawk.
  • Enabled Follow Mode and started driving, as fast as possible, and reached a constant velocity.
  • Studied the logs. At rest, the distance to target obtained by DN and DE should be 1 m (in front of the follow). But at a high constant velocity, it will be less than 1 m, or perhaps the follower will think it is in front of the target. With the logged speed and position offset, we calculated the delay.

Perhaps an easier way would be to transmit the GMS (GPS time in ms since the start of the week) from one Pixhawk to the other, and have the receiving Pixhawk log the difference between it’s GMS and the received GMS.I think there’s a couple Mavlink messages that send that value. But since many GPS update at only 5 or 10 Hz, I don’t know how that will affect the GMS substraction.

1 Like

I also made sure to disable the REQUEST_DATA_STREAM from Mission Planner and specified the rates for specific messages only for each telemetry port on both Pixhawk (as shown here. Only the strict minimum of messages our application required, to (hopefully) minimise latency.

Hi @bobzwik ! Thank you very much for this tread!

We’ve had our fair share of issues trying to land a Copter on a moving ship. Part of these woes I hope are addressed with this PR or its twin.

However, I didn’t expect the transport latency to be such a big issue, as by reading the code I was expecting this line to take care of it. Did you know of it? Do you have reasons to believe it’s not working as it should?

My suspicion (at least for our setup) is that the culprit for most of the inaccuracies are the dissimilar GNSS setups on target and follower. They are two different brands and models of GNSS receivers. What we haven’t done, and I think it would help, would be to set the target as the roving base-station for a relative RTK precision solution on the follower.

Can you talk a bit more about your GNSS setup?

BTW, I loved your 2nd experimental method to measure the latency!

this maybe useful.

Hi @Georacer !
I don’t think the _jitter.correct_offboard_timestamp_msec(packet.time_boot_ms, AP_HAL::millis()); corrects for transport latency, but only for jitter. The inputs to this fonction are

  • “time since boot” (in ms) of the target Pixhawk when it sent the GLOBAL_POSITION_INT Mavlink message,
  • and the “time since boot” (in ms) of the Pixhawk running the Follow mode.

Since there is no “absolute time” being used, the following Pixhawk can’t know that there is a 160 ms latency between the two Pixhawks. It can however smooth out the jitter, which is when the difference between both millisecond measurements (packet.time_boot_ms and AP_HAL::millis()) keep changing. But this jitter correction turned out to be really important, it really smoothed out the received target location. But on top of this variable jitter, there is a constant latency. Most of this latency comes from the low Mavlink priority: The target Pixhawk estimates it’s position, but then sends it 100-130 ms later. Even the low baudrate of RFD900 (57600) added a significant delay (which was reduced by 25 ms using different radios at 921600 baud).

Well, that’s at least my understanding of _jitter.correct_offboard_timestamp_msec. Please correct me if I’m wrong! :slight_smile:

Without taking this delay into account, which was thankfully constant throughout our testing, the drone would fly further and further behind the vehicle. At 50 km/h, a 160 ms delay causes a 2.2 m offset! But once we took that delay into account, the drone always landed on target. This only worked when the vehicle kept a constant velocity, but reducing the latency to 25 ms helped with acceleration/deceleration and turns, but improvements to the controller could be made for better aggressivity in fast forward flight.

For our setup, we use a CUAV C-RTK 2 as our base station, which transmits “RTK corrections” to the drone, who relays them to the target. The drone has a CUAV C-RTK 9Ps (UART) and the target has 2 of the same GPS modules. We use 2 to get the heading, since the compasses are affected by the vehicle’s steel.

It’s pretty incredible that everything worked out smoothly, the GPS modules on the target are in a “moving-baseline” configuration to get the heading, but also are doing RTK from the base station data.

I’ve had issues using the C-RTK 2 unit of the drone and 9Ps on the target, detailed here. I even found out later, due to a badly coded UAVCAN node (by CUAV), that the GPS velocity output was being delayed by 200 ms compared to the GPS position output. This really messed with the drone’s EKF when flying at high velocity. Our switch to the 9Ps on our drone fixed these issues, since it is a UART module, not DroneCAN/UAVCAN.

We also lowered the lower limits of the EK3 GPS accuracy paramaters, to allow the EKF to trust more the RTK GPS for position and velocity estimates, which improved the drone’s flight.

Here’s a visual example of our latency.


There’s a lot in this graph:

  • The red zone indicates the drone is in Follow Mode
  • The green zone indicates the drone is in our Follow and Land Mode
  • The blue line represents the “ground truth” distance between the drone and the target. The POS log structure from the log from both Pixhawk are synchronized using the logged GPS time.
  • The red line represents the distance between the drone and the target using the received and uncorrected data. This is calculated using the POS log of the drone and the FOLL.Lat/FOLL.Lon from the FOLL log of the drone.
  • The yellow line represents the distance between the drone and the target using the corrected received data from the target. This correction only includes the jitter correction. This is calculated using the POS log of the drone and the FOLL.LatE/FOLL.LonE from the FOLL log of the drone.
  • The purple line represents the distance between the target and virtual target, which in this case is 0.8 m behind the target (FOLL_OFS_X = -0.8). This is calculated by the drone in the Follow Mode and added to the FOLL log structure.
  • The green dotted line represents the moment the drone landed on the vehicle.
  • The red markers indicate the start and stop of the vehicle’s motion.

So Follow mode is enabled before the vehicle start’s to move, and once the vehicle starts, we clearly see the jitter. The jitter correction performs wonderfully, but starts to deviate from the ground truth. At impact, the drone thinks it is pretty much on target (purple line near 0) but as the vehicle slows down and comes to a stop, the distance to the virtual target increases by nearly a meter, indicating that the drone landed 0.8-1 m behind target. At 19 km/h, that distance is equivalent to about 160 ms.

We added our own code to deal with the latency, and added a parameter for the delay (FOLL_DELAY). Here is an example of the same graph of a test using this delay compensation.

In this case, the yellow line includes the jitter correction and our latency compensation, and it tracks pretty well the ground truth. And after impact, the distance to the virtual target stays unchanged once the vehicle slows down and stops.

We are so thankful that the jitter correction was already part of the code. Without it we would have been stumped for much longer!

Very cool graphs! Is that Matlab?

On a similar note, the moving landing code we used was more or less this one, which uses the PrecLand library to build the final target estimate.
There, the parameter PLND_LAG is used to correct for the exact same reason. So we did have access to that parameter and we did verify that it has an effect. However, as we didn’t have access to the logs of the target, it was hard to carry out the comparisons you did.

Did you also make use of the PrecLand module? This is just an FYI, I’m not proposing anything concrete here.

1 Like

Oh interesting, we knew of the PrecLand library, but decided to proceed with our own landing code. We though the PrecLand library was more aimed towards using visual sensors and lower velocities, and we didn’t need all the complexity of safety checks and failure/retry management.

Which means our code is far from PR worthy, and really only applicable to the trials we were conducting (driving in a straight line at constant velocity). Once landing was triggered, there was no going back.

But we didn’t know about the PLND_LAG parameter. Looking at it now, it seems more like the delay for the visual sensor, but if there is none and the landing is only reliant on received messages, then that parameter represents the communication latency? A similar parameter could be implemented with the Follow mode by default, or maybe the jitter correction could be done using GMS instead of ms_since_boot (but that would require sending an additional message from the target to the drone).

Our land custom Land mode basically reuses the Follow mode, but modified to create a simple maneuver before impact and detecting impact.

And yep, we graph with Matlab! I work in a university research lab. I’m gradually porting my own code to Python, but when making graphs for the prof, it’s Matlab :sweat_smile:

Any of you tested: AC_PrecLand: Use sensor timestamp to match inertial frame corrections by amilcarlucas · Pull Request #18548 · ArduPilot/ardupilot · GitHub ?

Might I toot my own horn and ask if you used Ardupilog?

I didn’t know of this PR. Do I get it right that the PrecLand backend should implement _backend->los_meas_time_ms();?

Perhaps it’s not very relevant with the original topic of this thread, but definitely interesting for Copter: Feature: Ship Landing by KosmX · Pull Request #24720 · ArduPilot/ardupilot · GitHub.
Not plug-and-play, since the .lua script doesn’t provide the measurement timestamp (I think), but nothing that can’t be fixed.

I rebased and improved the PR today, fixed a couple of bugs in corner cases.

Yep! I have use Ardupilog in the past! I didn’t realize that you were the creator!

I had to modify your code to add a datetime array to each log structure to be able to synchronize two flight logs. I was however hitting errors where TimeS or TimeMS had a huge jump, making datetime values millennials in the future. And I couldn’t find the source of the error. I was also using 30+ minute flight logs, with LOG_REPLAY enabled, which made the conversion a bit long. So I eventually wrote my own code to modify the .mat log generated by MP to use a structure format like Ardupilog.

1 Like