opencv computer-vision triangulation pose-estimation

Strange issue with stereo triangulation: TWO valid solutions

I am currently using OpenCV for a pose estimation related work, in which I am triangulating points between pairs for reconstruction and scale factor estimation. I have encountered a strange issue while working on this, especially in the opencv functions recoverPose() and triangulatePoints().

Say I have camera 1 and camera 2, spaced apart in X, with cam1 at (0,0,0) and cam2 to the right of it (positive X). I have two arrays points1 and points2 that are the matched features between the two images. According to the OpenCV documentation and code, I have noted two points:

recoverPose() assumes that points1 belong to the camera at (0,0,0).
triangulatePoints() is called twice: one from recoverPose() to tell us which of the four R/t combinations is valid, and then again from my code, and the documentation says:
```
cv::triangulatePoints(P1, P2, points1, points2, points3D)  : points1 -> P1 and points2 -> P2. 
```

Hence, as in the case of recoverPose(), it is safe to assume that P1 is [I|0] and P2 is [R|t].

What I actually found: It doesn't work that way. Although my camera1 is at 0,0,0 and camera2 is at 1,0,0 (1 being up to scale), the only correct configuration is obtained if I run

recoverPose(E, points2, points1...)
triangulatePoints([I|0], [R|t], points2, points1, pts3D)

which should be incorrect, because points2 is the set from R|t, not points1. I tested an image pair of a scene in my room where there are three noticeable objects after triangulation: a monitor and two posters on the wall behind it. Here are the point clouds resulting from the triangulation (excuse the MS Paint)

If I do it the OpenCV's prescribed way: (poster points dispersed in space, weird looking result)

Wrong

If I do it my (wrong?) way:

Correct

Can anybody share their views about what's going on here? Technically, both solutions are valid because all points fall in front of both cameras: and I didn't know what to pick until I rendered it as a pointcloud. Am I doing something wrong, or is it an error in the documentation? I am not that knowledgeable about the theory of computer vision so it might be possible I am missing something fundamental here. Thanks for your time!

Solution

I ran into a similar issue. I believe OpenCV defines the translation vector in the opposite way that one would expect. With your camera configuration the translation vector would be [-1, 0, 0]. It's counter intuitive, but both RecoverPose and stereoCalibrate give this translation vector.

I found that when I used the incorrect but intuitive translation vector (eg. [1, 0, 0]) I couldn't get correct results unless I swapped points1 and points2, just as you did.

I suspect the translation vector actually translate the points into the other camera coordinate system, not the vector to translate the camera poses. The OpenCV Documentation, seems to imply that this is the case:

The joint rotation-translation matrix [R|t] is called a matrix of extrinsic parameters. It is used to describe the camera motion around a static scene, or vice versa, rigid motion of an object in front of a still camera. That is, [R|t] translates coordinates of a point (X,Y,Z) to a coordinate system, fixed with respect to the camera.

Wikipedia has a nice description on Translation of Axis:

In the new coordinate system, the point P will appear to have been translated in the opposite direction. For example, if the xy-system is translated a distance h to the right and a distance k upward, then P will appear to have been translated a distance h to the left and a distance k downward in the x'y'-system