Search code examples
opencvimage-processingcomputer-visionstereo-3dperspectivecamera

Merging depth maps for trinocular stereo


I have a parallel trinocular setup where all 3 cameras are alligned in a collinear fashion as depicted below.

Left-Camera------------Centre-Camera---------------------------------Right-Camera

The baseline (distance between cameras) between left and centre camera is the shortest and the baseline between left and right camera is the longest.

In theory I can obtain 3 sets of disparity images using different camera combinations (L-R, L-C and C-R).I can generate depth maps (3D points) for each disparity map using Triangulation. I now have 3 depth maps.

The L-C combination has higher depth accuracy (measured distance is more accurate) for objects that are near (since the baseline is short) whereas the L-R combination has higher depth accuracy for objects that are far(since the baseline is long). Similarly the C-R combination is accurate for objects at medium distance.

In stereo setups, normally we define the left (RGB) image as the reference image. In my project, by thresholding the depth values, I obtain an ROI on the reference image. For example I find all the pixels that have a depth value between 10-20m and find their respective pixel locations. In this case, I have a relationship between 3D points and their corresponding pixel location.

Since in normal stereo setups, we can have higher depth accuracy only for one of the two regions depending upon the baseline (near and far), I plan on using 3 cameras. This helps me to generate 3D points of higher accuracy for three regions (near, medium and far).

I now want to merge the 3 depth maps to obtain a global map. My problems are as follows -

  1. How to merge the three depth maps ?
  2. After merging, how do I know which depth value corresponds to which pixel location in the reference (Left RGB) image ?

Your help will be much appreciated :)


Solution

  • 1) I think that simple "merging" of depth maps (as matrices of values) is not possible, if you are thinking of a global 2D depth map as an image or a matrix of depth values. You can consider instead to merge the 3 set of 3D points with some similarity criteria like the distance (refining your point cloud). If they are too close, delete one of theme (pseudocode)

    for i in range(points):
     for j in range(i,points):
      if distance(i,j) < treshold
        delete(j)
    

    or delete the 2 points and add a point that have average coordinates

    2) From point one, this question became "how to connect a 3D point to the related pixel in the left image" (it is the only interpretation). The answer simply is: use the projection equation. If you have K (intrinsic matrix), R (rotation matrix) and t (translation vector) from calibration of the left camera, join R and t in a 3x4 matrix

    [R|t]
    

    and then connect the M 3D point in 4-dimensional coordinates (X,Y,Z,1) as an m point (u,v,w)

    m = K*[R|t]*M
    

    divide m by its third coordinate w and you obtain

    m = (u', v', 1)
    

    u' and v' are the pixel coordinates in the left image.