Search code examples
pythonnumpyopencvlidarrealsense

Translate Region of Interest from RGB video to Depth video using OpenCV


I recently purchased the L515 RealSense camera, which has an RGB sensor and a depth(lidar) sensor. I also have a pre-trained model that detects hands only in RGB images, but I would like to translate this Region of Interest to the depth image. Unfortunately because of an offset between the two sensors, the image feeds do not line up exactly making this translation difficult.

I wrote a simple GUI script that allows me to pick 3 points (white) in each image feed and calculate an Affine Transformation matrix that can then be applied to line up the images.

Point Selection

However, the results have been unsuccessful.

Affine Transformation and Translated ROI

My guess is that this has to do with a difference in focal length between the two cameras. I'm wondering if there's anything I can do in OpenCV to better aline the images.


Solution

  • By far the easiest way will be to use the 'align' process in the librealsense API, example here: https://github.com/IntelRealSense/librealsense/blob/master/wrappers/python/examples/align-depth2color.py The important parts:

    align = rs.align(rs.stream.color)
    
    # Streaming loop
    try:
        while True:
            # Get frameset of color and depth
            frames = pipeline.wait_for_frames()
            # frames.get_depth_frame() is a 640x360 depth image
    
            # Align the depth frame to color frame
            aligned_frames = align.process(frames)
    
            # Get aligned frames
            aligned_depth_frame = aligned_frames.get_depth_frame()
            color_frame = aligned_frames.get_color_frame()
    

    Once you do this, the colour and depth images will be spatially aligned and the same resolution so an (i,j) pixel coordinate in the colour image will directly map to (i,j) in the depth image, though not all colour points will have valid depth data.