opencvcomputer-visiontriangulationstructure-from-motion

# Understanding cv2.recoverPose's coordinate frame transformations

I am having problems understanding the functionality of `cv2.recoverPose(points1, points2)`. As of my understanding of the documentation, this function should return the rotation matrix from camera1 to camera2 as well as the translation from camera1 to camera2.

I am using an artificial set of points, `points_3d`, as well as the transformation between two cameras, `T_1_0`, to recover the camera transformation in Python. To get the points on the cameras, I calculate the projections of the 3D points onto the camera sensors:

``````import cv2
import numpy as np

def calc_projection(K_c, transform, pt_3d):
p_projected = np.hstack((K_c, np.zeros((3,1)))) @ np.linalg.inv(np.vstack((transform, [0,0,0,1]))) @ np.vstack((pt_3d.reshape(3,1), 1))
p_projected = p_projected[:2,:] / p_projected[2,:]
p_projected = p_projected.ravel()
return p_projected

points_3d = np.random.rand(100, 3)

K_c = np.eye(3)

T_0 = np.hstack((np.eye(3), np.zeros((3,1))))
rot_vec = np.array([0.2, 0.1, 0.3])
R_1_0, _ = cv2.Rodrigues(np.array(rot_vec))
t_0_10 = np.array([0.2, 0.4, 0.1])
T_1_0 = np.hstack((R_1_0, t_0_10.reshape(3,1)))
points1 = []
points2 = []

for pt_3d in points_3d:
points1.append(calc_projection(K_c, T_0, pt_3d))
points2.append(calc_projection(K_c, T_1_0, pt_3d))

points1 = np.array(points1)
points2 = np.array(points2)

E, mask = cv2.findEssentialMat(points1, points2, method=cv2.RANSAC)
_, R, t, _ = cv2.recoverPose(E, inliers1, inliers2)

r, _ = cv2.Rodrigues(R)
assert np.allclose(r, rot_vec)
assert np.allclose(t, t_0_10)
``````

I would expect the result to be equal to `T_1_0` (as of the assertion) but the result is:

``````r = [[0.20329041]
[0.15711541]
[0.37188371]]
t = [[0.50969714]
[0.79593836]
[0.32663581]]
``````

What am I missing here? Why is it not working as expected? Am I doing something wrong or what is the expected behavior here?

## Edit

The formula I've used for the projection wrongly introduces the inverse of the transformation. It should be the following instead:

``````p_projected = np.hstack((K_c, np.zeros((3,1)))) @ np.vstack((transform, [0,0,0,1])) @ np.vstack((pt_3d.reshape(3,1), 1))
``````

## Side Note

`solvePnP` also solves the problem that I've been trying to solve here (3D->2D projection)

``````_, r, t = cv2.solvePnP(points_3d, points2, K_c, None)
assert np.allclose(r, rot_vec.reshape(3, 1), rtol=0.15)
assert np.allclose(t, t_0_10.reshape(3, 1), rtol=0.15)
``````

But I still don't know why `cv2.recoverPose` does not work? Looking at the documentation it should also return the translation and rotation...

Solution

• ## TL;DR

Most important findings:

• The `cv2.recoverPose(points1, points2,...)` function returns the rotation from camera1 to camera2 if points1 are found on camera1 and points2 are found on camera2.

• The returned translation vector is also from camera1 to camera2 but in the coordinate frame of camera1.

• The translation vector is found only up to a factor which can't be recovered without further logic.

• Setting the `cameraMatrix` on `findEssentialMat` is important.

## Description

I am now able to use `cv2.recoverPose` and recover the right rotation and translation. Here is the updated code:

``````E, mask = cv2.findEssentialMat(points1, points2, cameraMatrix=K_c)
_, R, t, _ = cv2.recoverPose(E, inliers1, inliers2, cameraMatrix=K_c)

r, _ = cv2.Rodrigues(R)
``````

The results are

``````# As expected
r = [[0.2]
[0.1]
[0.3]]

# expected: [0.2, 0.4, 0.1], but is:
t = [[0.43643578]
[0.87287156]
[0.21821789]]
``````

But! The documentation tells that the recovery of the translation is only possible up to a factor! So in this case, the following assertion works and the behavior is as expected!:

``````factors = t.ravel() / t_0_10
assert np.allclose(factors, factors[0])
``````

Comment: What's funny is that if I use `K_c = np.eye(3)` it sometimes works and sometimes not. But if I use the intrinsics of a real camera, the assertion always is true...

``````K_c = np.array([
[485.0, 0.0, 320.0],
[0.0, 485.0, 240.0],
[0.0,0.0,1.0]
])
``````