Search code examples

Estimate depth of a 2D pixel given intrinsic, extrinsic, and a constraint of Y=0

I have a single-view camera at a certain height (h) from the ground. Through calibration I have obtained intrinsic parameters K, Rotation matrix and translation vector [R|t] and, because I have full access to the camera and the environment, I can measure whatever I want. My goal is to estimate depth of a pixel [u,v] on the camera given that I know that the pixel is on the floor (so it is at y=-h with respect to the camera). Given this constraint, I did the following (without success):

  • create a new 3D point P1 from [u, v] and the camera parameters + focal length: [u - cx, v - cy, f]
  • multiply P by the inverse of my camera matrix K and call the result P2
  • multiply P2 by the inverse of the [R|t] matrix and call the result P3
  • P3 is a 4x1 vector, so we normalize it and bring it down to 3x1 [X1, Y1, Z1]. This point should be the world coordiante projection of my [u, v] point
  • Solve X and Z when Y=-h in the following way:
    • x = x1 * (-h / y1)
    • y = z1 * (-h / y1)

Unfortunately it dosen't look right! I have settled on this problem for 2 weeks now, so it would be really great to get some help from the community. I'm sure it's something obvious that I am missing out.

Thanks again


  • The homogeneous image coordinate is P1 = [u,v,1], or [f*u,f*v,f].

    The multiplication with the inverse of the camera matrix gives you a ray along which the 3D point is located. P2 ~= K⁻¹ * P1 (~= is equality up to a scale factor)

    Let's assume the camera is located at C (which is (0,0,0,1) in the camera's coordinate system), and the vector P2 has the form [x,y,z,0]. (The zero at the end makes it translation invariant!)

    Then the 3D point you are looking for is located at C + k*P2 and you must solve for the variable k.

    P3 = Rt⁻¹ * (C + k*P2)
    P4 = C2  + k * P3   

    C2 is the camera position in world coordinates. P3 is the vector in world coordinates. P4 is your point at Y=-h

    Finally, plug in your constraint Y=-h and calculate k using the y components:

    k = (-h - C2_y) / P3_y