math computer-vision coordinates object-detection tracking

convert 2D coordinates to another 2D coordinates

i am working on program for tracking people from cctv cameras, where i have video input from cctv camera and i have 2D top view image of building that contains all cctv cameras. when i detect person in the scene i draw bounding box around the person, and this bounding box has center point represented as x,y in image coordinates system, and what i want to do is to convert this bounding box center point to another 2D point in building image coordinates system. can any one give me a hint or idea ???

here the image from cctv camera, and the image of building where the red line in cctv image is the line that i have and the red line in building image is the line i want to obtain.

CCTV camera image :

building image :

Solution

So because the camera is projecting a 3D world down to 2D and then you want to get it from another 2D perspective there can be errors. For instance the center point of the rectangle of the person could either be ~2.7ft off the ground where they are standing OR could be the floor if the person wasn't there. Those would be 2 very different places on your birds eye view map.

However, since this is for tracking people, you could make the assumption that everyone is approximately the same height and assume the center of the rectangle is always approximately 2.7ft off the ground. If you make this assumption then the problem is more tractable.

With that assumption, what you could do is have a calibration phase. So you could have the person stand at the end of the hallway and see what coordinates that is on the camera and what coordinates on the map. Then have them walk up in front of the camera and see the coordinates on the camera and coordinates on the map. With these two points you could do a linear interpolation to be able to figure out where in the hallway someone is based on the camera. You would need to do this calibration for each camera that you have but it should give fairly accurate results.

Let (x1, y1) be the camera coordinates at end of hall and (X1, Y1) map coordinates at end of hall. Then let (x2, y2) be camera coordinates when close to camera and (X2, Y2) map coordinates close to camera. So then find a linear A such that

A(x1, y1) = (X1, Y1) and A(x2, y2) = (X2, Y2)

You can solve this as a matrix equation (not sure how to type math in here)

A|x1 x2| = |X1 X2|
 |y1 y2|   |Y1 Y2|

A = |X1 X2||x1 x2|^(-1)
    |Y1 Y2||y1 y2|

And this should give you a decent way to convert coordinates on the camera to coordinates on the map.