Search code examples
coordinatesprojection-matrixlidarkitti

Projection of 3D Lidar point in the i-th camera image (KITTI Dataset)


I am working on an object classification problem, and I am using Lidar and camera data from the Kitti Dataset. In this article: http://www.cvlibs.net/publications/Geiger2013IJRR.pdf, they provide the formulas for projecting the 3d PointCloud into the i-th camera image plane, but I don't understand some things :

Following equation((3) :

If the 3D point X is in velodyne camera image and Y in the i'th camera image, why X has four coordinates and Y three? It should have been 3 and 2, no?

Formula
(source: noelshack.com)

I need to project the 3D point Cloud into the camera image plane for then creating lidar images to use them as a channel for the CNN. Anyone who has ideas for it ?

Thank you in advance


Solution

  • For your first query regarding x and y dimension there are two explanation.

    Reason 1.

    • For image re-projection pin hole camera model is used which is in perspective coordinate or homogenous coordinate. Perspective projection uses the image origin as centre of projection and points are mapped to the plane z=1. A 3D point [x y z] is represented by [xw yw zw w] and the point it maps on the plane is represented by [xw yw zw]. Normalising with w gives.

      So (x,y) -> [x y 1]T : Homogeneous Image Coordinates

      and (x,y,z) - > [x y z 1] T : Homogeneous Scene Coordinates

    Reason 2.

    • With respect to the paper you have attached, considering equation (4) and (5)

      enter image description here

      enter image description here

      It is clear that P is of dimension 3X4 and R is expanded to 4x4 dimension.Also x is of dimension 1x4. So as per matrix multiplication rule number of columns of first matrix must equal to the number of rows of second matrix. So for given P of 3x4 and R of 4x4, x has to be 1x4.

    Now coming to your second question of LiDAR image fusion, It requires intrinsic and extrinsic parameters (relative rotation and translation) and camera matrix. This rotation and translation forms a 3x4 matrix called as transformation matrix. So the point fusion equations becomes

    [x y 1]^T = Transformation Matrix * Camera Matrix * [X Y Z 1]^T
    

    You can also refer :: Lidar Image Fusion KITTI

    Once your LiDAR image fusion is done, you can input this image to your CNN model.I am not aware of DNN modules for LiDAR fused image.

    Hope this helps..