GSoC 2020: Enhancements of non-GPS navigation and object avoidance with Realsense cameras

LuckyBird · May 7, 2020, 6:25am

The 2020 GSoC ArduPilot summer project has officially started. This year, I will have the privilege of working with @ppoirier and @jmachuca77. Thank you for letting me having another great summer with ArduPilot and I hope we will have a fun and productive time working together!

1. Introduction

Unmanned Aerial Vehicle (UAV), commonly known as drone, is becoming an essential tool for a variety of industries such as mapping, delivery, surveillance, warehouse, and agriculture, to name but a few. However, their potential is currently mostly limited to outdoor scenarios because their flight capability relies on GPS signal, which becomes unstable or unavailable in many other situations such as indoor, under a bridge, between tall buildings, in a tunnel, etc. Not only that, another crucial component for fully autonomous operation, which is the ability to avoid obstacles that could pop up unpredictably during the mission, is often only available in high-end proprietary products that cost thousands of dollars, or too complex of a solution to be applicable for the sensor and processing power onboard low-end drones.

Navigation in non-GPS environments while avoiding obstacles, therefore, plays a critical role and is a highly desired feature for any autonomous robot. VIO tracking camera and depth camera when appropriately combined can offer both non-GPS navigation and obstacle-sensing abilities that don’t overexert the processors of low-end drones and, perhaps more importantly, keep plenty of room for further development by the user.

In this project, we will integrate and improve the usage of these technologies with ArduPilot, in an attempt to bring non-GPS navigation and obstacle avoidance into basic abilities of ArduPilot that everyone can rely on and build upon.

2. New enhancements coming to ArduPilot

The Intel Realsense T265 has been established as a “plug-and-play” sensor for ArduPilot from this 2019 GSoC project. Detailed introductions can be found on the following wiki pages:

Object avoidance with rotating LiDAR has also been long available. However, there remains room for improvements. This project would contribute the below benefits to the community, which are also our main objectives:

Enhancements of the vision-related codebase

Summary: The handling of vision-related messages in the ArduPilot codebase has been moved under the AP_VisualOdom library. Certain improvements that can be introduced during this project include:

Enhancement of the handling of VISION_POSITION_ESTIMATE and VISION_POSITION_DELTA messages with the Intel Realsense T265 for both non-ROS and ROS.
Detect and pass the SLAM’s tracking position reset counter from the VIO SLAM on the T265 to AP using “reset_counter” MAVLink2 extension field.
Another major improvement can be done by measuring the lag between camera and autopilot’s IMU and ensuring that the EKF is using the correct lag.
Send velocity information to AP, either by adding support for VISION_SPEED_ESTIMATE or ODOMETRY, which can carry all the information provide by the Realsense T265 into one single message.

Integration of Realsense Depth camera

Summary: Develop method(s) to convert the depth image into OBSTACLE_DISTANCE MAVLink messages, which is the straightforward way to interface with AC_Avoidance library.

As of now, AP supports providing obstacle information in one of two types of messages: DISTANCE_SENSOR and OBSTACLE_DISTANCE. Among the two, OBSTACLE_DISTANCE is the better choice for it provides an array of distances, which should better represent the depth data. Hence, data from depth image will be converted to OBSTACLE_DISTANCE message, which is then sent to the FCU.
First, it should be made clear how the obstacle in the environment is represented by the OBSTACLE_DISTANCE message as well as by the depth image. Once the underlying principles are clear, a conversion can be made from the depth image to the messages. The messages can then be sent to the FCU at a predefined frequency.
We can now follow the OBSTACLE AVOIDANCE wiki page to perform testing. Most importantly, trials should be carried out to establish good default values for parameters and the system can then be released for beta testers (blog posts).
Once the performance is verified by a number of testers, official documentation can be made (wiki page).

Integration of Tracking + Depth cameras (SLAM+ROS)

Summary: while tracking and depth cameras can be used separately with ArduPilot, this objective looks for a SLAM-based approach leveraging the Robot Operating System (ROS) framework to combine both cameras into one package to extend the potential use cases of ArduPilot.

To combine the two cameras together, first we need to calibrate the extrinsics between cameras (with any types of mounting and not just the provided one) based on OpenCV and Python.
Synching the data streams between the cameras is also important. Here’s a working example.
There are several SLAM examples such as this occupancy-mapping to generate 2D occupancy map, but RTABMap or Google Cartographer are also plausible alternatives.
The next step is to convert the data from the two cameras into the format that the new SLAM package accepts and into the position + obstacle data (with support added above) to be sent to AP.

To properly process the all the data from the two cameras (depth image, pose, potentially fisheye images), a new companion computer class will be introduced to ArduPilot, namely the Up Squared.

GPS to non-GPS transition

Summary: Seamless transitition between GPS and non-GPS environment is a highly desirable feature. Several solutions are currently under development and initial results are promising.

A direct approach, which are being actively developed and tested by @rmackay9, @anbello and @ppoirier, is using the VISION_POSITION_DELTA and EKF3 (which will also have lots of development coming) to use the delta in position and rotation data to assist the transition.
Another potential solution is to use additional sensor information to detect if the vehicle is indoor or outdoor. For example, a sonar/1D range sensor can be mounted upward (point to the ceiling) to detect the transition from indoor (range data available) to outdoor (no range data) and perform indoor-outdoor transition.
One can also think of a solution, when the camera can see at least part of the sky, to use images to detect when the vehicle is out in the open and when it is not, to facilitate the transition between GPS and non-GPS localization data. This has been done before in research

3. Useful References

Easier setup for Intel Realsense T265 by @rmackay9.
Using the Realsense T265 with MAVCONN library (cpp, non-ROS) to send various kinds of messages to AP by @anbello: blog post and Github repo.
Experiment with Visual Odometry - ROVIO part 1 and part 2 by @ppoirier.
Indoor flight with external navigation data by @chobitsfan.
2019 GSoC Project (integration of the tracking camera Realsense T265 and ArduPilot).

4. Progress Update

The realization of the aforementioned enhancement objectives would require clear and detailed system requirements, implementation plan and deliverables, which will be gradually published in the form of blog posts. The posts are also there to receive feedback/suggestions and improve/modify the system as needed before moving on to the next step as well as support pages for users in the future.
As these blog posts are introduced, the list will be updated below this section for easy lookup. As usual, any input is welcome and I look forward to hearing from you all.

–Update – Related Blogs and Wiki

ppoirier · May 7, 2020, 8:34am

Thanks for participating on another year of GSoC @LuckyBird

This second year will allow us to push the Visual Flight Autonomy experiments to the next level and hopefully gain more momentum in the community.

soldierofhell · May 7, 2020, 9:29am

Hi @LuckyBird, is this supposed to be for Rover? I must check what this Realsense occupancy is doing, but in general I would stick to standard ROS navigation stack, so costmap_2d. RTAB-map has a nice obstacles_detection nodelet to use. As for Copter everything becomes complicated But at least you can get Octomap from RTAB easly. Definitely go for RTAB, e.g. you can compile and use directly for odometry VIN-Fusion and LOAM. Good luck, I will be following

chobitsfan · May 7, 2020, 11:32am

Send velocity information to AP, either by adding support for VISION_SPEED_ESTIMATE or ODOMETRY, which can carry all the information provide by the Realsense T265 into one single message.

Hi @LuckyBird I have a PR AP_NavEKF2: support VISION_SPEED_ESTIMATE, Maybe I could help?

LuckyBird · May 7, 2020, 11:34am

Hi @soldierofhell, the focus of this project is Copter, but of course you can test it out on any other platforms.
RTABMAP is definitely a prominent solution. The main thing is to test whether the results are good enough given the computer (Up squared) and the sensors (T265 + D435), compare it to other options out there and select one that would be easy for newcomers to replicate.

LuckyBird · May 7, 2020, 11:39am

@chobitsfan I was actually referring to your PR when I wrote that part, thank you for pointing it out.
At the moment, I mainly focus on the onboard computer side (providing the correct message). If the PR goes in soon, I will push the stuff related to VISION_SPEED_ESTIMATE message forward, otherwise I will focus on the other parts first.

soldierofhell · May 7, 2020, 1:32pm

@LuckyBird, I asked because you wrote about 2d occupancy, but of course simplifying the problem to some constant altitude monitoring can be fine. BTW There’s this former GSoC project OctomapPlanner. It doesn’t use SLAM (localization on /map), but can be reused to some degree (planning/collision), although for 2D I would stick to navigation stack compatible tools

rmackay9 · May 8, 2020, 12:03am

Yes, getting this PR in is still very high on my to-do list. I will probably first fix the EKFs glitch handling when using external have and then merge this PR in. I think it will need a touch of rework to move it to use the AP-visualOdom library but that shouldn’t be too hard I hope

rmackay9 · May 8, 2020, 12:14am

@LuckyBird, great to have you back for another year! I’m really happy to read about these plans especially the sending of depth data into AP for use in avoidance. It may be helpful to capture the depth from different areas of the screen depending upon the vehicle attitude. Ie if it’s leaning forward use the top of the frame. If it’s rolled to the right use an area that stretches from the top right to the bottom left… or maybe it’s best to just get the closest items we see. Alternatively we could modify the AP side so it interprets the depths in body-frame.

Re handling the transition between GPS and non-GPS environments, Tridge and I are also keen to resolve this. We think the first step is enhancing the EKF to support a manual switch but soon after we want to make it automatic. It may be that we can do this by comparing the reported accuracies from the GPS and Cameras (this is how we do GPS blending) or maybe we can look at the innovations. For example we could modify the EKF so it has a method which can test a sensor input and return what the innovation would be if used. Then we could use the sensor with the lower innovation. Another idea is to run two parallel EKFs and use the one that says it’s more healthy.

Anyway, I’m very much looking forward to this project and helping out where I can

chobitsfan · May 8, 2020, 3:08am

I think it will need a touch of rework to move it to use the AP-visualOdom library but that shouldn’t be too hard I hope

Hi @rmackay9 Maybe I could help?

rmackay9 · May 8, 2020, 7:48am

@chobitsfan, I was hoping you might offer! If you want to try modifying your PR (or making a new one) that handles the speed message in the AP_VisualOdom library that would be great. There’s a _MAV backend and an _IntelT265 backend, but if we want both to be able to handle the message then the handler code can probably go into the AP_VisualOdom_Backend similar to the handle_vision_position_delta_msg() method.

chobitsfan · May 14, 2020, 3:45am

Thank you for suggestions. I have created https://github.com/ArduPilot/ardupilot/pull/14368

arush · May 14, 2020, 2:20pm

Hey @LuckyBird,
I’ve recently sold my company and am looking for an open source project to get stuck into. I have a jetson nano and pixhawk cuav v5+ and am moderately familiar with ardupilot. I’ve been implementing the redtail project on that system. I have completed the udacity autonomous flight course and published a lot of my work here: https://arush.webflow.com

Do you think I could help?

LuckyBird · May 15, 2020, 5:16am

Hi @arush, thank you for your interest in the project.

If you are keen on trying the project, details will be provided in the upcoming blog posts regarding setup and testing.

Beyond this project, there are many open issues in the codebase along with various ways that you can contribute to ArduPilot. You can also find many desired feature requests from users on this forum as well as Github.

I would suggest exploring them, then choose something that sparks your interest and make a project out of it.

Corrado_Steri · May 15, 2020, 10:12am

Hello there, great work you guys have done and are going to do!!!

Could you elaborate bit more for people like me rather ignorant what are the differences between T265 and D400 cameras? I guess they do different jobs but i don’t quite understand what.

Thanks

LuckyBird · May 15, 2020, 12:09pm

Hi @Corrado_Steri, in a nutshell the two lines of cameras are built for different purposes:

The T265 camera is meant for Visual-Inertial tracking, with very large field-of-view grayscale cameras, IMU inside and dedicated processing unit for tracking the position of the camera in the world.
The D4xx camera is meant for depth measurements, with normal/large field-of-view infrared (IR) cameras and a RGB camera, optionally added with IMU, and dedicated processing unit to measure the depth of the scene right in front of the camera.

From the software point of view, you can use one for the other’s purpose: use T265’s cameras to calculate depth and D4xx’s camera+IMU for tracking. But the hardware’s limitation might lead to sub-optimal performance (since the software leverages on the hardware) unless you use really good software which will most likely consume a lot of your own processing power that you don’t have to pay for otherwise.

Here’s an article that sums up all the details, a video introducing the different types of cameras and a video regarding software for the the tracking camera as well.