Your friend’s XY that he is getting from the camera is relative to the aircraft as it will be looking directly down. So all you need to know is which way to move the aircraft until X = 0 and Y = 0. You should build in some tolerance otherwise it will be get twitchy in flight.
The example you posted from github looks like the marker object in the video frame perhaps has been detected such as that of the center of the landing pad (pattern etc)(it could be the detected center of a target and drawing a marker on the video output at the center of the target, using this value thereafter), you need to know the width and height of the landing pad pattern so that it can work out the offset value as you descend. Then feed the xoffset and yoffset into the movement command.
The distance variable seems to be trying to use X Y Z to get the distance, but you would need a 3D camera for this. XOffset etc knowing the size of the target and how large it is in your video frame will tell you how close you are. This will depend on the camera, lens etc to get that value relative to the size of the pad. Just do some simple tests with the camera at distances away from the target to get your real values and thus you will be able to code this into your algorithm.
Hope it helps and that I understood your question correctly.