I'm having trouble understanding the Perspective-n-Point problem. A few questions:
What is s
for? Why do we need a scale factor for the image point?
Is K[R|T]
a "change of coordinates matrix" which moves p_w
, the homogenous world point, into the coordinate space of the 2D image plane?
[R|T]
represents the "rotation and translation" of the camera relative to the corresponding world point p_w
and that is what we are trying to solve for. What's particularly difficult about this? Can't we just say [R|T] =inv(K)s(p_c)inv(p_w)
? I just did this with some basic matrix algebra.Thanks for any help!
In typical camera pinhole equation
s represents Z coordinate of point in camera coordinate system
Right, K[R|t]
is projection matrix, which maps 3d coordinates in some object/world/global coordinate system into image 2d coordinates as in equation above.
It is not so easy, because you often don't know point cooridnates in camera coordinate syetem, but know 2d coordinates in image coordinate system. Transformation between camera coordinates system and image coordinate system looses one dimension, and there is also scale factor which makes our equation not-exactly linear. That's why it is not so easy to compute.
Different algorithms uses different approaches to add additional information needed for solution. For example DLT (direct linear transform) method uses features of projection matrix. Beside analytic solutions there are also many methods which use nonlinear optimization - for example Levenberg-Marquardt used in openCV.