I have a single-view camera at a certain height (h) from the ground. Through calibration I have obtained intrinsic parameters K, Rotation matrix and translation vector [R|t] and, because I have full access to the camera and the environment, I can measure whatever I want. My goal is to estimate depth of a pixel [u,v] on the camera given that I know that the pixel is on the floor (so it is at y=-h with respect to the camera). Given this constraint, I did the following (without success):
Unfortunately it dosen't look right! I have settled on this problem for 2 weeks now, so it would be really great to get some help from the community. I'm sure it's something obvious that I am missing out.
Thanks again
The homogeneous image coordinate is P1 = [u,v,1]
, or [f*u,f*v,f]
.
The multiplication with the inverse of the camera matrix gives you a ray along which the 3D point is located.
P2 ~= K⁻¹ * P1 (~= is equality up to a scale factor)
Let's assume the camera is located at C (which is (0,0,0,1) in the camera's coordinate system), and the vector P2 has the form [x,y,z,0]
. (The zero at the end makes it translation invariant!)
Then the 3D point you are looking for is located at C + k*P2
and you must solve for the variable k.
P3 = Rt⁻¹ * (C + k*P2)
P4 = C2 + k * P3
C2 is the camera position in world coordinates.
P3 is the vector in world coordinates. P4 is your point at Y=-h
Finally, plug in your constraint Y=-h
and calculate k
using the y
components:
k = (-h - C2_y) / P3_y