GSoC 2018: Realtime Mapping and Planning for Collision Avoidance

Some of you might be thinking that why am I writing a blog post on collision avoidance. Isn’t it already there in Ardupilot? Afterall there is even a dedicated wiki to explain precisely how it works right? Well, I agree with you, but you see there are a few shortcomings with that approach. First, you need an expensive sensor like the LightWare SF40C to compute the distance from the obstacles. Let us assume that you saved a few bucks (~1k for LightWare SF40C) and managed to get it and took your drone out to play around or to brag about the cool sensor you just bought to your friends and you notice that the drone is still unable to fly through narrow passages or sometimes just fails to find a way around the obstacles. Well, I know the feeling. And then there is this thing which probably bothers me the most that why can’t I do autonomous navigation giving just the goal and copter does the rest even if it is in a maze-like environment. So I thought to myself that wouldn’t it be awesome to have a completely autonomous system which can use a cheap stereo camera (going for 60$ to 80$ these days) and do all of these things.

This summer, I am quite excited to be working on doing the same as a part of GSoC 2018! So basically my proposal is divided into two parts, first is the real-time 3D mapping and second is goal-directed planning for obstacle avoidance. The idea is to keep on generating the 3D map of the environment on the fly and try to reach to the goal point by reactively computing intermediate waypoints to the final goal which avoids the obstacles registered on the map. The computation of map and the trajectories can be done on a companion computers like Raspberry Pi or Odroid.

Let’s talk some more about the 3D mapping framework and how it will work. As mentioned above, I am using a stereo camera which gives us the depth information of the environment. The way this works is that we try to figure out where exactly an arbitrary point say [x, y, z] projects into both the cameras (pixel coordinates). Using this information for every point in the left and right pair of images we create a disparity map like this:

This disparity map can then be projected into a point cloud if we know the baseline between the two cameras. Thankfully all of this is can be implemented using simple OpenCV functions. The end result is that for every image frame we have a 3D point cloud which can now be used for mapping.

The idea behind mapping is simple. Firstly, we will need a reference frame. For now, let us assume it is the Local NED frame which means that our map is aligned to the north as its +X direction with its origin at the local frame. Next, we need to transform the point cloud data from the camera frame to this map frame before integrating it with our final map. This transformation comes from the odometry data from the drone hence the quality of the map will be directly proportional to the accuracy of the odometry. This should give us a fused 3D point cloud which looks like this:

But wait, our ultimate objective is not just to create a 3D map but also to use it for planning. This is where a different map representation called occupancy map comes into the picture. The idea behind an occupancy map is to discretize the world into grids and use the Bayes theorem to compute and update the posterior probability of each cell being occupied using observations from current measurements. This method of mapping serves two purposes. First, we can easily find a path to the goal avoiding the cells which are occupied. Second, we are not just dumping the information from the past and utilizing it to improve our map at every sensor feedback iteration.

But the story isn’t over yet, this kind of representation scales well in 2D like in case of a rover but in 3D it becomes quite difficult to keep a track of each and every cell in the map. It also poses a serious memory overhead as the size of the map increases. To resolve this issue an interesting data structure was proposed called Octrees. An Octree encodes the 3D grid information efficiently in the memory and makes the operations like traversing on it very fast. Hence I am currently using a library called Octomap which does all this for me. So the input to this tree is a point cloud and after thresholding the probabilities we get a binary representation of occupancy map consisting of information about all the occupied and unoccupied cells. Here is how it looks like:

octomap

The next steps involve conducting some more tests on SITL simulations for calibrating some parameters to improve the quality of the map and extracting only a small bounded region before giving it to the planner. I am also thinking of projecting this map in 2D and publishing it as a DISTANCE_SENSOR message for failsafe on AC_Avoidance library if the planner fails to find a feasible trajectory.

I am developing this framework without having any dependency on ROS to enable most of you to use it on any Linux based system for now, but I do have plans to do the same using ROS for folks (like me :P) who are into that. The teaser image is of a similar pipeline which I implemented on ROS a while ago [code]

I have yet to test this on my drone, just hoping for the weather to get a little bit better. I am hoping to share some more results on the drone with you guys soon. Once mapping framework is complete I will also share my ideas on how I will implement the planning module. I look forward to your suggestions and comments.

I would like to thank my mentors @jmachuca77 and @rmackay9 for their valuable feedback and also @khancyr and @tridge for helping me fix certain issues with the SITL simulations.

43 Likes

Oh interesting, subscribed!

1 Like

Fantastic! Can you really process this fast enough on a raspberry to be useful?

2 Likes

Great start, will be interesting to watch your progress

1 Like

Awesome!
Love your work!

1 Like

I hope so, although currently the bottleneck is in fusing the point cloud to generate octomap. I am using multithreading to make my code as fast as possible but I think operating the drone aggressively will require a faster processor than Pi and probably would require a lot more optimizations in the code. I’ll try it on Odroid since I have it with me and see how fast can it work on that.

@AyushGaud congratulation for your work and thanks for sharing.
I have a little doubt: in the link you posted for stereo camera I see an interesting product but it uses rolling shutter cameras whereas in all the packages used for visual odometry that i saw is always recommended to use global shutter cameras. What do you think about this?

Cool question! @anbello, I hear this a lot. Interestingly one of the most popular stereo cameras is ZED and they have shown some great results in motion tracking as well as reconstruction. In my experience so far, global shutter cameras are worth the money if the quality of visual odometry is of the primary interest and the task involves high rate angular motion. You might like this (~150$) camera if you are interested in inexpensive global shutter cameras. I have used both this and ZED and they both work fine.

The Realsense R200 and D450 both have stereo IR global shutter cameras as well. They also provide a synthetic depth map in realtime from onboard ASIC, not sure if you can use that for this project?

Yeah, they both are good options but IR based (or any time-of-flight based) cameras, in general, don’t work well in natural lighting conditions. Although Intel claims that with their new lineup of D400 sensor supports outdoor usage with their passive IR stereo pair, I have not yet seen any reviews of it so far so not sure about that. I would be really interesting to see if it works since the price of these sensors are also quite competitive.

I ended up using the R200 as just a cheap global shutter camera for CV work, see second and third videos here:


From what few depth videos I’ve seen from the D400s it’s quite a bit better outside!

I dont know for your part of the world, but here (Canada) it will be available AFTER @AyushGaud Gsoc…

Mouser #: 607-82635AWGDVKPRQ
Description:
Video Modules Real Sense Depth Camera D435
Stock:
0
On Order:
147
View Delivery Dates
Factory Lead-Time: 20 Weeks

The 415 is stock but it is NOT a global shutter

Cool! I’ll see if I can get my hands on it. I am currently in India and here it is costing me ~350$ for D415

Cool! I suggest using PLC point cloud library because it has alot of functions and optimized algorithms for processing point clouds. Could you please mention how are you doing the simulation testing right now with the SITL?

@AyushGaud Did you had a chance to read through this post?
http://official-rtab-map-forum.67519.x6.nabble.com/RGB-D-SLAM-example-on-ROS-and-Raspberry-Pi-3-td1250.html

Thanks for your suggestion @Subhi I am also currently using the PCL library. For SITL testing I have added a stereo camera to the iris drone Gazebo model. I subscribe to the image topic from Gazebo and perform the disparity computation and point cloud generation. Then I use a few filters like SOR, passthough and voxel filter and pass it on for octomap generation. For fusing multiple point clouds I have written a small mavlink code which subscribes to the drone’s odometry and use it to transform each of the point clouds accordingly in the local NED frame.

1 Like

Thanks @ppoirier, RTAB-Map is one of my favourite implementations. I once tried it on Odroid with ZED camera and the results were pretty awesome. I would also love to add a support like this (maybe a small wrapper would do the trick) to ArduPilot which can do dense mapping as well as improve localization accuracy. There is just a small downside to this, these frameworks optimize for poses and map in a offline kind of fashion since BA and loop closure happens over previous keyframes. Hence, to use it for improving the map representation is not always helpful since the point clouds needs to be integrated to occupancy map at every iteration otherwise the complete map needs to be regenerated every time after the optimization. Of course there are methods to do it efficiently (like memory management proposed in the RTAB-Map itself) and use it for octomaps but that might require significant amount time and probably would come under the domain of research.

Basically that come to the main issue with the project; Capability to process a minimal mapping as a front end system at the moment (excluding the Bundle Adjustment and Loop Closure) using RPI or Odroid. And this is why I was asking about this thread because the author says this:

4. Keep in mind that on RPi, rtabmap works at max 4-5 Hz. Adding navigaiton in 3D (you may need an octomap), this would may not be possible on a single RPi. Another design could be to stream the data from the quadcopter to a workstation, on which mapping and navigation are working, sending back the commands.

Keep on the good work and I really appreciate you keep us updated with this discussion

Currently, the octomap generation (bottleneck) on my PC itself takes ~1s (without voxel filter) although, if I decrease the map resolution and increase the leaf size for voxel then I was able to run it at ~10 Hz which is also not that great given it’s a desktop computer. I am currently trying to run some tests on Odroid with a stereo camera. I’ll keep you posted with further results as soon as I get them.

I have no direct experience in this but offloading your bottleneck calculations to something like the Movidius Myriad 2. It is in a bunch of things like DJI drones and Raspberry pi vision, Neural Compute Stick (USB) may help to do this fast with a Raspberry Pi.

It is hard to find information on it but it looks like it may work: https://www.hotchips.org/wp-content/uploads/hc_archives/hc28/HC28.21-Tutorial-Epub/HC28.21.2-3D-Depth-Sensors-Epub/HC28.21.221-volumetric-data-Moloney-movidius-18Aug2016.pdf

1 Like