python opencv computer-vision projection homography

finding the mapping between video point and real world point

I am doing car tracking on a video. I am trying to determine how many meters it traveled.

I randomly pulled 7 points from a video frame. I made point1 as my origin

Then on the corresponding Google Maps perspective, I calcculated the distances of the 6 points from the orgin (delta x and delta y)

Then I ran the following

pts_src = np.array([[417, 285], [457, 794], [1383, 786], [1557, 423], [1132, 296], [759, 270], [694, 324]])

pts_dst = np.array([[0,0], [-3, -31], [30, -27], [34, 8], [17, 15], [8, 7], [6, 1]])

h, status = cv2.findHomography(pts_src, pts_dst)

a = np.array([[1032, 268]], dtype='float32')
a = np.array([a])

# finally, get the mapping
pointsOut = cv2.perspectiveTransform(a, h)

When I tested the mapping of point 7, the results are wrong.

Am I missing anything? Or am I using the wrong method? Thank you

Here is the image from the video

I have marked the points and here is the mapping

The x,y column represent the pixels on the image. The metered column represent the distance from the the origin to the point in meters. I basically, usging google maps, converted the geo code to UTM and calculated the x and the y difference.

I tried to input the 7th point and I got [[[14.682752 9.927497]]] as output which is quite far in the x axis.

Any idea if I am doing anything wrong?

Solution

Cameras are not ideal pinhole cameras and therefore the homography cannot capture the real transform.

For small angle cameras the result are quite close, but for a fish-eye camera the result can be very off.

Also, in my experience, just the theoretical lens distortion model found in literature is not very accurate with real-world lenses (multi-element that do "strange" things to compensate for barrel/cushion distortion). Today is also viable the use of non-spherical lenses where the transformation can be just anything.

To be able to get accurate results the only solution I found was actually mapping the transformation function using an interpolating spline function.

EDIT

In your case I'd say the problem is in the input data: considering the quasi-quadrilateral formed by the points 6, 3, 1, 2

If the A-D distance in meters is 36.9, how can B-C distance be 53.8 meters?

May be the problem is in how you collected the data, or that google maps shouldn't be considered reliable for such small measurements.

A solution could be just measuring the relative distances of the points and then finding their coordinates on the plane solving from that distance matrix.

EDIT

To check I wrote a simple non-linear least squares solver (works by stochastic hill climbing) using a picture of my floor to test it. After a few seconds (it's written in Python, so speed it's not its best feature) can solve a general pinpoint planar camera equation:

 pixel_x = (world_x*m11 + world_y*m12 + m13) / w
 pixel_y = (world_x*m21 + world_y*m22 + m23) / w
 w = (x*m31 + y*m32 + m33)

 m11**2 + m12**2 + m13**2 = 1

and I can get a camera with less that 4 pixel maximum error (on a 4k image).

With YOUR data however I cannot get an error smaller than 120 pixels. The best matrix I found for your data is:

0.0704790534896005     -0.0066904288370295524   0.9974908226049937
0.013902632209214609   -0.03214426521221147     0.6680756144949469
6.142954035443663e-06  -7.361135651590592e-06   0.002007213927080277

Solving your data using only points 1, 2, 3 and 6 I get of course an exact numeric solution (with four general points there is one exact planar camera) but the image is clearly completely wrong (the grid should lie on the street plane):