How to project a point cloud to a depth image using Open3D's project_to_depth_image?

My end goal is to generate a top down / side orthographic (or close to orthographic) views from a point cloud using Open3D (which is easy to install via pip install open3d)

I'm trying to find the simplest method. One option is to use the visualiser's capture_screen_float_buffer however I'd like to avoid using 2 renders (one for each view).

I've also spotted open3d.t.geometry.PointCloud.project_to_depth_image.

I've tried to adapt their geometry/point_cloud_to_depth.py example to load a point cloud instead of using an RGBD image which gets converted to a pointcloud (like their example).

The issue I have is that I get all zeros for the depth image from an arbitrary point cloud.

Here's the example as I've modified it (sans license for brevity):


import open3d as o3d
import numpy as np
import matplotlib.pyplot as plt
from math import radians

if __name__ == '__main__':

    def get_intrinsic(width, height):
        return o3d.core.Tensor([[1, 0, width * 0.5], 
                                [0, 1, height * 0.5],
                                [0, 0, 1]])

    def get_extrinsic(x = 0, y = 0, z = 0, rx = 0, ry = 0, rz = 0):
        extrinsic = np.eye(4)
        extrinsic[:3,  3] = (x, y, z)
        extrinsic[:3, :3] = o3d.geometry.get_rotation_matrix_from_axis_angle([radians(rx),radians(ry), radians(rz)])
        return extrinsic

    def compute_show_reprojection(pcd, width, height, intrinsic, extrinsic):
        depth_reproj = pcd.project_to_depth_image(width,
                                                  height,
                                                  intrinsic,
                                                  extrinsic,
                                                  depth_scale=5000.0,
                                                  depth_max=10.0)

        
        fig, axs = plt.subplots(1, 2)
        axs[0].imshow(np.asarray(depth.to_legacy()))
        axs[1].imshow(np.asarray(depth_reproj.to_legacy()))
        plt.show()
        
    
    width, height = 640, 480
    intrinsic = get_intrinsic(width, height)
    extrinsic = get_extrinsic()
    # original example data
    tum_data = o3d.data.SampleTUMRGBDImage()
    depth = o3d.t.io.read_image(tum_data.depth_path)
    pcd = o3d.t.geometry.PointCloud.create_from_depth_image(depth,
                                                            intrinsic,
                                                            extrinsic,
                                                            depth_scale=5000.0,
                                                            depth_max=10.0)

    compute_show_reprojection(pcd, width, height, intrinsic, get_extrinsic(z=1, rz=-45))

    # testing a differen point cloud
    pcd_data = o3d.data.PCDPointCloud()
    pcd = o3d.io.read_point_cloud(pcd_data.path)
    pcd = o3d.t.geometry.PointCloud.from_legacy(pcd)
    c   = pcd.get_center().numpy()
    minb= pcd.get_min_bound().numpy()
    maxb= pcd.get_max_bound().numpy()
    print('point cloud center', c)
    print('min', minb)
    print('max', maxb)
    compute_show_reprojection(pcd, width, height, intrinsic, get_extrinsic(c[0], c[1] + 1, c[2] + 1))

This is a plot of the TUM depth image and point cloud projected image (where I experimented with a different camera pose) and that works as expected:

This is a lot of the same TUM depth image and a "blank" image on the right where I'm expecting a different depth map from an arbitrary point cloud:

I'm not sure if it's a matter of not placing the camera in the right place, a limitation of project_to_depth_image not able to project from a sparse/arbitrary point cloud or something else I'm might be missing.

Is it possible to project_to_depth_image with an arbitrary point cloud ? If so, how ? (If not, what Open3D techniques (other than creating a separate visualiser/render) can you recommend ?)

(I've tested with Open3D 0.16.0 on Windows 11 (which sadly means I can't rely on headless rendering)

Update: when you zoom into the second image a tiny tiny point cloud can actually be spoted.

I've made a bit of progress adding test keyboard shortcuts to move around a bit (changing the extrinsics matrix):

import open3d as o3d
import numpy as np
import matplotlib.pyplot as plt
from math import radians

import cv2
from time import time

if __name__ == '__main__':

    def get_intrinsic(width, height):
        return o3d.core.Tensor([[1, 0, width * 0.5], 
                                [0, 1, height * 0.5],
                                [0, 0, 1]])

    def get_extrinsic(x = 0, y = 0, z = 0, rx = 0, ry = 0, rz = 0):
        extrinsic = np.eye(4)
        extrinsic[:3,  3] = (x, y, z)
        extrinsic[:3, :3] = o3d.geometry.get_rotation_matrix_from_axis_angle([radians(rx),radians(ry), radians(rz)])
        return extrinsic

    def compute_show_reprojection(pcd, width, height, intrinsic, extrinsic, window_wait=3000):
        now = time()
        depth_reproj = pcd.project_to_depth_image(width,
                                                  height,
                                                  intrinsic,
                                                  extrinsic,
                                                  depth_scale=5000.0,
                                                  depth_max=10.0)

        
        # fig, axs = plt.subplots(1, 2)
        # axs[0].imshow(np.asarray(depth.to_legacy()))
        # axs[1].imshow(np.asarray(depth_reproj.to_legacy()))
        # plt.show()
        print(time() - now)
        cv2.imshow("depth", np.asarray(depth_reproj.to_legacy()))
        return cv2.waitKey(window_wait)
        
    
    width, height = 640, 480
    intrinsic = get_intrinsic(width, height)
    extrinsic = get_extrinsic()
    # original example data
    tum_data = o3d.data.SampleTUMRGBDImage()
    depth = o3d.t.io.read_image(tum_data.depth_path)
    pcd = o3d.t.geometry.PointCloud.create_from_depth_image(depth,
                                                            intrinsic,
                                                            extrinsic,
                                                            depth_scale=5000.0,
                                                            depth_max=10.0)

    # compute_show_reprojection(pcd, width, height, intrinsic, get_extrinsic(z=1, rz=-45))

    # testing a differen point cloud
    # pcd_data = o3d.data.PCDPointCloud()
    # pcd = o3d.io.read_point_cloud(pcd_data.path)
    # pcd = o3d.t.geometry.PointCloud.from_legacy(pcd)
    # test random points point cloud
    points = np.random.rand(100000, 3)
    point_cloud = o3d.geometry.PointCloud()
    point_cloud.points = o3d.utility.Vector3dVector(points)
    pcd = o3d.t.geometry.PointCloud.from_legacy(point_cloud)

    c   = pcd.get_center().numpy()
    minb= pcd.get_min_bound().numpy()
    maxb= pcd.get_max_bound().numpy()
    print('point cloud center', c)
    print('min', minb)
    print('max', maxb)
    key = ' '
    x, y, z = -0.95, -0.95, -0.95
    while key != ord('q'):
        key = compute_show_reprojection(pcd, width, height, intrinsic, get_extrinsic(x, y, z), 40)
        if key == ord('x'):
            x -= 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('X'):
            x += 0.1
            print(x, y, z)
    
        if key == ord('y'):
            y -= 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('Y'):
            y += 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('z'):
            z -= 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('Z'):
            z += 0.1
            print(f"x, y, z = {x, y, z}")

What's the easiest way to compute extrinsics to get a top down view and what other arguments could I use to get an orthogonal projection ?

Solution

Turns out I was pretty close with the update above: I just needed to have a focal length values in the intrinsics matrix to avoid extreme distortions.

In my case (virtual view, unrelated to a physical camera with proper intrinsics), I'm using a generic focal length scale:

def get_intrinsic(width, height, cscale=6064):
        return o3d.core.Tensor([[cscale, 0     , width * 0.5], 
                                [0     , cscale, height * 0.5],
                                [0     , 0     , 1]])

Here's a tweaked version of the above with a few more keyboard shortcuts:

import cv2
import numpy as np
import open3d as o3d


if __name__ == '__main__':

    def get_intrinsic(width, height, focal_dist=6064):
        return o3d.core.Tensor([[focal_dist, 0     , width * 0.5], 
                                [0     , focal_dist, height * 0.5],
                                [0     , 0     , 1]])

    def get_extrinsic(x = 0, y = 0, z = 0, rx = 0, ry = 0, rz = 0):
        extrinsic = np.eye(4)
        extrinsic[:3,  3] = (x, y, z)
        extrinsic[:3, :3] = o3d.geometry.get_rotation_matrix_from_axis_angle(np.radians(np.asarray((rx, ry, rz))))
        return extrinsic

    def compute_show_reprojection(pcd, width, height, intrinsic, extrinsic, depth_max=10.0, depth_scale=15.0, window_wait=3000):
        depth_reproj = pcd.project_to_depth_image(width,
                                                  height,
                                                  intrinsic,
                                                  extrinsic,
                                                  depth_scale=depth_scale,
                                                  depth_max=depth_max)

        
        depth_mat = np.asarray(depth_reproj.to_legacy())
        cv2.imshow("depth", depth_mat)

        return cv2.waitKey(window_wait), depth_mat
        
    
    width, height = 640, 480
    
    points = np.random.rand(100000, 3)
    point_cloud = o3d.geometry.PointCloud()
    point_cloud.points = o3d.utility.Vector3dVector(points)
    pcd = o3d.t.geometry.PointCloud.from_legacy(point_cloud)

    key = ' '
    pcd = pcd.cuda()
    print('pcd.is_cuda', pcd.is_cuda)
    
    x, y, z, = -0.44999999999999996, -0.5, 300.0
    depth_max = 10000.0
    depth_scale = 15.0
    intrinsic_scale = 1.0
    c_scale = 120000
    rx = ry = rz = 0
    
    while key != ord('q'):
        # update intrinsic, extrinsic matrices
        intrinsic = get_intrinsic(width * intrinsic_scale, height * intrinsic_scale, focal_dist = c_scale)
        extrinsic = get_extrinsic(x, y, z, rx, ry, rz)
        # reproject (and display) depth map from point cloud
        key, depth_img = compute_show_reprojection(pcd, width, height, intrinsic, extrinsic, \
                                                   depth_max=depth_max, depth_scale=depth_scale, window_wait=40)
        
        if key == ord('x'):
            x -= 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('X'):
            x += 0.1
            print(x, y, z)
    
        if key == ord('y'):
            y -= 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('Y'):
            y += 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('z'):
            z -= 0.1
            print(f"x, y, z = {x, y, z}")
    
        if key == ord('Z'):
            z += 0.1
            print(f"x, y, z = {x, y, z}")

        if key == ord('m'):
            depth_max -= 0.1
            print('depth_max', depth_max)
    
        if key == ord('M'):
            depth_max += 0.1
            print('depth_max', depth_max)
    
    
        if key == ord('s'):
            depth_scale -= 0.1
            print('depth_max', depth_scale)
    
        if key == ord('S'):
            depth_scale += 0.1
            print('depth_max', depth_scale)
            cv2.imwrite('depth_reproj.png', depth_img)
    
        if key == ord('i'):
            intrinsic_scale -= 0.1
            print('intrinsic_scale', intrinsic_scale)
        if key == ord('I'):
            intrinsic_scale += 0.1
            print('intrinsic_scale', intrinsic_scale)

        if key == ord('c'):
            c_scale -= 100
            print('c_scale', c_scale)

        if key == ord('C'):
            c_scale += 100
            print('c_scale', c_scale)

        if key == ord('u'):
            rx += 10    
    
        if key == ord('j'):
            rx -= 10  

        if key == ord('i'):
            ry += 10    
    
        if key == ord('k'):
            ry -= 10  

        if key == ord('o'):
            rz += 10    
    
        if key == ord('l'):
            rz -= 10

The view is close to orthographic. So far the focal length seems to help reduce perspective distortion, but I'm not sure 100% it's a good method to optain an accurate (side/top down) view. It's a good enough workaround for now, but I'd be more than happy to see other methods.

(Note that this is slow on the CPU and relies on CUDA for decent frame rates. I had a few headaches compiling Open3D with CUDA on Windows with Python 3.8.10. Just in case someone else want to try this in a similar setup, I got around the issue by explicitly adding win_mode=0 to the CDLL call loading the CUDA .pyd. (e.g.

_pybind_cuda = _CDLL(
            str(next((_Path(__file__).parent / 'cuda').glob('pybind*'))), winmode=0)

in site-packages/open3d/__init_.py). More detials on github)