I am currently pulling frames from the camera on an ios device. From these frames I use ORB to find keypoints and their descriptors. I then use BFMatcher to find matches between keypoints on the images.
From here I am hoping to compute the distance from the camera to these points. All the points I am using are planar (right now I test using pins on a wall). It will not be necessary to account for non-planar surfaces at this stage so this should hopefully make it easy.
I currently have:
I think I have to use triangulation in some form but am not sure entirely how this works. I know I have to pass a ray from each camera (as defined by camera projection matrix?) through each keypoint and find the point where they intersect (or are closest to intersecting). As I assume in 3D space the chances of each ray intersecting is highly unlikely. Also, my keypoint matches are typically quite good as I do some basic filtering but sometimes the key points are wrong so I need to account for this.
I calibrated the camera using Matlab prior to this so I have the focal length, principal point and distortion. However, all the points that I pull from the images are in 2D. Presumably for this it is necessary to represent these points in 3D but I am not sure how.
Or am I taking the wrong approach to this entirely?
Obviously this will be done for each point in the scene but I just drew a single point in. The planes (squares) will always be in the same position but the camera position will vary from frame to frame. The keypoints will be in the same position but not every point is picked up for every frame.
See Hartley-Sturm's famous paper on optimal triangulation, as well as Kanatani's variant: