c++math computer-vision mathematical-expressions

How do you judge the (real world) distance of an object in a picture?

I am building a recognition program in C++ and to make it more robust, I need to be able to find the distance of an object in an image.

Say I have an image that was taken 22.3 inches away of an 8.5 x 11 picture. The system correctly identifies that picture in a box with the dimensions 319 pixels by 409 pixels.
What is an effective way for relating the actual Height and width (AH and AW) and the pixel Height and width (PH and PW) to the distance (D)?

I am assuming that when I actually go to use the equation, PH and PW will be inversely proportional to D and AH and AW are constants (as the recognized object will always be an object where the user can indicate width and height).

Solution

I don't know if you changed your question at some point but my first answer it quite complicated for what you want. You probably can do something simpler.

1) Long and complicated solution (more general problems)

First you need the know the size of the object.

You can to look at computer vision algorithms. If you know the object (its dimensions and shape). Your main problem is the problem of pose estimation (that is find the position of the object relative the camera) from this you can find the distance. You can look at [1] [2] (for example, you can find other articles on it if you are interested) or search for POSIT, SoftPOSIT. You can formulate the problem as an optimization problem : find the pose in order to minimize the "difference" between the real image and the expected image (the projection of the object given the estimated pose). This difference is usually the sum of the (squared) distances between each image point Ni and the projection P(Mi) of the corresponding object (3D) point Mi for the current parameters.

From this you can extract the distance.

For this you need to calibrate you camera (roughly, find the relation between the pixel position and the viewing angle).

Now you may not want do code all of this for by yourself, you can use Computer Vision libs such as OpenCV, Gandalf [3] ...

Now you may want to do something more simple (and approximate). If you can find the image distance between two points at the same "depth" (Z) from the camera, you can relate the image distance d to the real distance D with : d = a D/Z (where a is a parameter of the camera related to the focal length, number of pixels that you can find using camera calibration)

2) Short solution (for you simple problem)

But here is the (simple, short) answer : if you picture in on a plane parallel to the "camera plane" (i.e. it is perfectly facing the camera) you can use :

PH = a AH / Z
PW = a AW / Z

where Z is the depth of the plane of the picture and a in an intrinsic parameter of the camera.

For reference the pinhole camera model relates image coordinated m=(u,v) to world coordinated M=(X,Y,Z) with :

m   ~       K       M

[u]   [ au as u0 ] [X]
[v] ~ [    av v0 ] [Y]
[1]   [        1 ] [Z]

[u] = [ au as ] X/Z + u0
[v]   [    av ] Y/Z + v0

where "~" means "proportional to" and K is the matrix of intrinsic parameters of the camera. You need to do camera calibration to find the K parameters. Here I assumed au=av=a and as=0.

You can recover the Z parameter from any of those equations (or take the average for both). Note that the Z parameter is not the distance from the object (which varies on the different points of the object) but the depth of the object (the distance between the camera plane and the object plane). but I guess that is what you want anyway.

[1] Linear N-Point Camera Pose Determination, Long Quan and Zhongdan Lan

[2] A Complete Linear 4-Point Algorithm for Camera Pose Determination, Lihong Zhi and Jianliang Tang

[3] http://gandalf-library.sourceforge.net/