opencv camera computer-vision motion opticalflow

Calculating camera motion out of corresponding 3d point sets

I am having a little problem. I wrote a program that extracts a set of three-dimensional points in each frame using a camera and depth information. The points are in the camera coordinate system, which means the origin is at the camera center, x is horizontal distance, y vertical distance and z the distance from the camera (along the optical axis). Everything is in meters. I.e. point (2,-1,5) would be two meters right, one meter below and five meters along the optical axis of the camera.

I calculate these points in each time frame and also know the correspondences, like I know which point in t-1 belongs to which 3d point in t.

My goal now is to calculate the motion of the camera in each time frame in my world coordinate system (with z pointing up representing the height). I would like to calculate relative motion but also the absolute one starting from some start position to visualize the trajectory of the camera.

This is an example data set of one frame with the current (left) and the previous 3D location (right) of the points in camera coordinates:

-0.174004 0.242901 3.672510 | -0.089167 0.246231 3.646694 
-0.265066 -0.079420 3.668801 | -0.182261 -0.075341 3.634996 
0.092708 0.459499 3.673029 | 0.179553 0.459284 3.636645 
0.593070 0.056592 3.542869 | 0.675082 0.051625 3.509424 
0.676054 0.517077 3.585216 | 0.763378 0.511976 3.555986 
0.555625 -0.350790 3.496224 | 0.633524 -0.354710 3.465260 
1.189281 0.953641 3.556284 | 1.274754 0.938846 3.504309 
0.489797 -0.933973 3.435228 | 0.561585 -0.935864 3.404614

Since I would like to work with OpenCV if possible I found the estimateAffine3D() function in OpenCV 2.3, which takes two 3D point input vectors and calculates the affine transformation between them using RANSAC.

As output I get a 3x4 transformation matrix.

I already tried to make the calculation more accurate by setting the RANSAC parameters but a lot of times the trnasformation matrix shows a translatory movement that is quite big. As you can see in the sample data the movement is usually quite small.

So I wanted to ask if anybody has another idea on what I could try? Does OpenCV offer other solutions for this?

Also if I have the relative motion of the camera in each timeframe, how would I convert it to world coordinates? Also how would I then get the absolute position starting from a point (0,0,0) so I have the camera position (and direction) for each time frame?

Would be great if anybody could give me some advice!

Thank you!

UPDATE 1:

After @Michael Kupchick nice answer I tried to check how well the estimateAffine3D() function in OpenCV works. So I created two little test sets of 6 point-pairs that just have a translation, not a rotation and had a look at the resulting transformation matrix:

Test set 1:

1.5 2.1 6.7 | 0.5 1.1 5.7
6.7 4.5 12.4 | 5.7 3.5 11.4
3.5 3.2 1.2 | 2.5 2.2 0.2
-10.2 5.5 5.5 | -11.2 4.5 4.5
-7.2 -2.2 6.5 | -8.2 -3.2 5.5
-2.2 -7.3 19.2 | -3.2 -8.3 18.2

Transformation Matrix:

1           -1.0573e-16  -6.4096e-17  1
-1.3633e-16 1            2.59504e-16  1
3.20342e-09 1.14395e-09  1            1

Test set 2:

1.5 2.1 0 | 0.5 1.1 0
6.7 4.5 0 | 5.7 3.5 0
3.5 3.2 0 | 2.5 2.2 0
-10.2 5.5 0 | -11.2 4.5 0
-7.2 -2.2 0 | -8.2 -3.2 0
-2.2 -7.3 0 | -3.2 -8.3 0

Transformation Matrix:

1             4.4442e-17  0   1
-2.69695e-17  1           0   1
0             0           0   0

--> This gives me two transformation matrices that look right at first sight...

Assuming this is right, how would I recalculate the trajectory of this when I have this transformation matrix in each timestep?

Anybody any tips or ideas why it's that bad?

Solution

This problem is much more 3d related than image processing.

What you are trying to do is to register the knowing 3d and since for all the frames there is same 3d points->camera relation the transformations calculated from registration will be the camera motion transformations.

In order to solve this you can use PCL. It is opencv's sister project for 3d related tasks. http://www.pointclouds.org/documentation/tutorials/template_alignment.php#template-alignment This is a good tutorial on point cloud alignments.

Basically it goes like this:

For each pair of sequential frames 3d point correspondences are known, so you can use the SVD method implemented in

http://docs.pointclouds.org/trunk/classpcl_1_1registration_1_1_transformation_estimation_s_v_d.html

You should have at least 3 corresponding points.

You can follow the tutorial or implement your own ransac algorithm. This will give you only some rough estimation of the transformation (can be quite good if the noise is not too big) in order to get the accurate transfomation you should apply ICP algorithm using the guess transformation calculated at the previous step. ICP is described here:

http://www.pointclouds.org/documentation/tutorials/iterative_closest_point.php#iterative-closest-point

These two steps should give you an accurate estimation of the transformation between frames.

So you should do pairwise registration incrementally - registering first pair of frames get the transformation from first frame to the second 1->2. Register the second with third (2->3) and then append the 1->2 transformation to the 2->3 and so on. This way you will get the transformations in the global coordinate system where the first frame is the origin.