Search code examples

Extract projective homography from two Kinect depth maps

Given two consecutive 3D point clouds 1 and 2 (not the whole cloud, say 100 points selected from the cloud with OpenCV's GoodFeaturesToMatch), obtained from a Kinect depthmap, I want to compute camera's homography from 1 to 2. I understand that this a projective transform, and it has already been done by many people: here (slide 12), here (slide 30) and here in what seems to be the classic paper. My problem is that whilst I'm a competent programmer, I haven't got the math or trig skills to turn one of those methods into code. As this is not an easy problem, I offer a large bounty for the code that solves the following problem:

The camera is at the origin, looking in the Z direction, at irregular pentahedron [A,B,C,D,E,F]: camera position 1

The camera moves -90mm to the left (X), +60mm up (Y), +50mm forwards (Z) and rotates 5° down, 10° right and -3° anticlockwise: camera position 2

Rotating the entire scene so that the camera is back at its original position allow me to determine the vertices' locations at 2: enter image description here

The 3DS Max files used to prepare this are max 1, max 2 and max 3

Here are the vertices' positions before and after, the intrinsics, etc.: vertices and intrinsics

Note that camera2's vertices are not 100% accurate, there's a bit of deliberate noise.

here are the numbers in an Excel file

The code I need, which must be readily translatable into VB.Net or C#, using EMGUCV and OpenCV where necessary, takes the 2 sets of vertices and the intrinsics and produces this output:

Camera 2 is at -90 X, +60 Y, +50 Z rotated -5 Y, 10 X, -3 Z.
The homography matrix to translate points in A to B is:
a1, a2, a3
b1, b2, b3
c1, c2, c3

I don't know if the homography is 3X3 or 3X4 for homogenous coordinates, but it must allow me to translate the vertices from 1 to 2.

I also don't know the values a1, a2, etc; that's what you have to find >;-)

The 500 bounty offer 'replaces' the bounty I offered to this very similar question, I've added a comment there pointing to this question.

EDIT2: I'm wondering if the way I'm asking this question is misleading. It seems to me that the problem is more of point-cloud fitting than of camera geometry (if you know how to translate and rotate A to B, you know the camera transform and vice-versa). If so, then perhaps the solution could be obtained with Kabsch's algorithm or something similar


  • "The correct" algorithm to use for computing difference between two snapshots of 2D or 3D point clouds is called ICP (Iterative Closest Point). The algorithm solves ICP

    In human-readable-format: For given point sets P1 and P2 find the rotation matrix R and translation T that transforms P1 to P2. Just make sure they are normalized around their origin.

    The algorithm is conceptually simple and is commonly used in real-time. It iteratively revises the transformation (translation, rotation) needed to minimize the distance between the points of two raw scans.

    For those interested this is a topic within Computational Geometry Processing