Hello everyone,
So me and my team had been trying to implement a monocular depth estimation based navigation (you only need just a single camera as the main sensor in addition to an optical flow module) for an indoor drone for our final year engineering project. We adapted a project called MonoNav using ZoeDepth and the Crazyflie platform for ArduPilot and swapped out the depth estimation model with Depth Anything V2 with some navigation planner improvements.
Well, here it is:
I have tried to document as much as I could inside the readme. As shown in the image, this implementation can actually avoid obstacles!
Going forward, I think it’s best to ditch the motion-primitive planner and use a global planning algorithm and use ArduPilot’s waypoint navigation as this has approach has to be more reliable. In my 20/20 hindsight, we should have tried this approach as soon as possible instead of trying to figure out why the existing implementation wasn’t satisfactory.
Theoretically, if you change the dataset to VKITTI, it should work outdoors too, but we haven’t tested this.
I would like you all to test this if possible and improvements are welcome.
Thanks for the blog on this subject! Monocular position estimation is very hard it seems, I’ve never actually seen someone succeed but maybe I haven’t looked hard enough.
In that repo you’ll find a code called AP_ObstacleAvoidance. That script is a modification of the d4xx_to_mavlink.py which is used for RealSense based object avoidance but BendyRuler didn’t work for me even though the proximity data appears correct. Any idea why that is?
Also things to note in the current implementation that I haven’t also mentioned in the readme are:
The planning is done based purely on the seen obsctales. Might be better to add an emergency reactive maneuver based on depth map of the current frame without integrating into the VBG first. This will help eliminate any false flying straight into obstacles.
it is primarily meant for static environments. So moving objects will be integrated into the VoxelBlockGrid and stay there forever. I tried to implement a VBG update in the areas that the camera is currently seeing but failed and ran it of time.
There’s a new model called DepthAnything3 which seems to also implement some SLAM although the compute required for this is huge. Hope someone gets does a fully single camera based navigation eliminating optical flow using that or a dedicated SLAM pipeline along with a monocular depth model.