Search code examples
pythonimage-processingpoint-cloudsdepthlidar

Inpainting of sparse 2D LiDAR image to dense depth image


I'm working on a classification problem (object classification for autonomous vehicle). I use a dataset from KITTI which provide LiDAR and camera data and want to use both of this data to perform the task.

3D LIDAR data is projected onto the coordinate system of the RGB image resulting in a sparse LiDAR image:

sparse

original_image

Each pixel is encoded using depth (distance to the point : sqrt(X² + Y²), scaling between 0 and 255).

In order to obtain better results for my CNN, I need a dense LiDAR image, anyone know how to do it using Python?

I would like to obtain something like this

goal


Solution

  • I've never worked with point-cloud data/LIDAR before, but as nobody has answered yet, I'll give it my best shot. I'm not sure about inpainting approaches per-say, though I imagine they might not work very well (except for maybe a variational method, which I presume would be quite slow). But if your goal is to project the 3D LIDAR readings (when accompanied by ring ids and laser intensity readings) into a dense 2D matrix (for use in a CNN), the following reference might prove useful. Additionally, in this paper they reference a previous work (Collar Line Segments for Fast Odometry Estimation from Velodyne Point Clouds) which covers the technique of polar binning in more detail, and has C++ code available. Check out the papers, but I'll try and summarize the technique here:

    Encoding Sparse 3D Data with Polar Binning

    CNN for Very Fast Ground Segmentation in Velodyne LiDAR Data - Describes its preprocessing technique in section III.A (Encoding Sparse 3D Data Into a Dense 2D Matrix). enter image description here

    • 1) Let P represent your original point cloud, and M the multi-channel dense matrix you are hoping to output. The size of M depends on the number of laser beams used in scanning and the horizontal angular resolution of the scanner.
    • 2) Aggregate the point cloud data into polar bins b(r, c), where r represents the ring id and c = floor((R * atan(x/z) + 180)/360).
    • 3) Use the following mapping to map the bin b(r, c) to the corresponding value in the matrix M, m(r, c), where p^i is the laser intensity reading:

    enter image description here

    • 4) In the case of empty bins, linearly interpolate the value of m(r,c) from its neighborhood.

    Improving performance of sparse mapping

    Finally, looking at the following paper, they introduce some techniques for using the sparse Velodyne readings in a CNN. Maybe see if any of these improve your performance?

    Vehicle Detection from 3D Lidar Using Fully Convolutional Network - Describes its preprocessing technique in section III.A (Data Preparation).

    Encoding the range data as a 2-channel image

    • 1) Initialize a 2-channel matrix I; Fill with zeros
    • 2) Given coordinates (x, y, z), let theta = atan2(y, x) and let phi = arcsin(z/sqrt(x^2 + y^2 + z^2))
    • 3) Let delta_theta, delta_phi equal the average horizontal and vertical resolution between consecutive beam emitters, respectively.
    • 4) Let r = floor(theta/delta_theta); Let c = floor(phi/delta_phi)
    • 5) Let d = sqrt(x^2 + y^2)
    • 6) Let I(r, c) = (d, z); if two points projected into the same position (rare), keep the one nearer to the observer

    Unequal (Up/Down)sampling

    • In the first convolutional layer, the authors downsample by 4 horizontally and 2 vertically; This is because for Velodyne point maps, points are denser in the horizontal layer. They upsample by this same factor in their final deconvolutional layers (which simultaneously predict a vehicle's 'objectness' and its bounding box).

    All techniques are implemented with respect to the KITTI dataset/Velodyne LIDAR, so I imagine they could work (perhaps with some modification) for your particular use-case.