Assume I have good correspondences between two images and attempt to recover the camera motion between them. I can use OpenCV 3's new facilities for this, like this:
Mat E = findEssentialMat(imgpts1, imgpts2, focal, principalPoint, RANSAC, 0.999, 1, mask);
int inliers = recoverPose(E, imgpts1, imgpts2, R, t, focal, principalPoint, mask);
Mat mtxR, mtxQ;
Mat Qx, Qy, Qz;
Vec3d angles = RQDecomp3x3(R, mtxR, mtxQ, Qx, Qy, Qz);
cout << "Translation: " << t.t() << endl;
cout << "Euler angles [x y z] in degrees: " << angles.t() << endl;
Now, I have trouble wrapping my head around what R
and t
actually mean. Are they the transform needed to map coordinates from camera space 1 to camera space two, as in p_2 = R * p_1 + t
Consider this example, with ground-truth manually labeled correspondences
The output I get is this:
Translation: [-0.9661243151855488, -0.04921320381132761, 0.253341406362796]
Euler angles [x y z] in degrees: [9.780449804801876, 46.49315494782735, 15.66510133665445]
I try to match this to what I see in the image and come up with the interpretation, that [-0.96,-0.04,0.25]
tells me, I have moved to the right, as the coordinates have moved along the negative x-Axis, but it would also tell me, I have moved further away, as the coordinates have moved along the positive z-Axis.
I have also rotated the camera around the y-Axis (to the left, which I think would be a counter-clockwise rotation around the negative y-Axis because in OpenCV, the y-Axis points downwards, does it not?)
Question: Is my interpretation correct and if no, what is the correct one?
It turns out my interpretation is correct, the relation p2 = R * p1 + t
does indeed hold. One can verify this by using cv::triangulatePoints()
and cv::convertPointsFromHomogeneous
to obtain 3D coordinates from corresponding points (relative to camera 1) and then applying the above equation. Multiplication with camera 2's camera matrix then yields the p2
image coordinates.