Can I convert screen coord to camera coordinate using this procedure?

I want to find camera coordinate from pixel/screen coordinate using opencv.

Suppose my cameras are calibrated and I got intrinsic parameter (matrix with focal length and principal point) and extrinsic parameter (rotation and translation matrix) using opencv. Then this website for 3d reconstruction with opencv says:

s * [q  1]^{Transpose} = [K] * [([R] * P) + T]

where [q] is 2d pixel coordinate, s = 1, K is a (3x3) intrinsic matrix, R is a (3x3) rotation matrix, P is (3x1) in world coordinate and T is a (3x1) translation matrix.

So:

[R]^{-1} * ( [ [K]^{-1} * [q  1 ]^{Transpose} ] - [T] ) = [P]

And then:

[U] = ([R] * [P]) + [T]

where [U] is (3x1) in camera coordinate. So now [q] which is in pixel coordinate will be converted to camera coordinate [U].

Am I right to convert pixel coordinate to camera coordinate like this? Is rotation matrix ([R]) or intrinsic matrix ([K]) always invertible? Or are there times when rotation matrix and/or intrinsic matrix can't be inverted?

Is it possible to kindly confirm this?

Solution

I am too lazy to check that for you (actually, you should test it, and if you have questions, come here for help).

But you can use this nice code snippet: it's not exactly what you want, but the fundamentals are right:

Opencv virtually camera rotating/translating for bird's eye view