Project 3D mesh on 2d image using camera intrinsic matrix

I've been trying to use the HOnnotate dataset to extract perspective correct hand and object masks as shown in the images of Task-3 of the Hands-2019 challenge.

The data set (version 3) comes with the following annotations:

annotations:
    The annotations are provided in pickled files under meta folder for each sequence. The pickle files in the training data contain a dictionary with the following keys:
    objTrans: A 3x1 vector representing object translation
    objRot: A 3x1 vector representing object rotation in axis-angle representation
    handPose: A 48x1 vector represeting the 3D rotation of the 16 hand joints including the root joint in axis-angle representation. The ordering of the joints follow the MANO model convention (see joint_order.png) and can be directly fed to MANO model.
    handTrans: A 3x1 vector representing the hand translation
    handBeta: A 10x1 vector representing the MANO hand shape parameters
    handJoints3D: A 21x3 matrix representing the 21 3D hand joint locations
    objCorners3D: A 8x3 matrix representing the 3D bounding box corners of the object
    objCorners3DRest: A 8x3 matrix representing the 3D bounding box corners of the object before applying the transormation
    objName: Name of the object as given in YCB dataset
    objLabel: Object label as given in YCB dataset
    camMat: Intrinsic camera parameters
    handVertContact: A 778D boolean vector whose each element represents whether the corresponding MANO vertex is in contact with the object. A MANO vertex is in contact if its distance to the object surface is <4mm
    handVertDist: A 778D float vector representing the distance of MANO vertices to the object surface.
    handVertIntersec: A 778D boolean vector specifying if the MANO vertices are inside the object surface.
    handVertObjSurfProj: A 778x3 matrix representing the projection of MANO vertices on the object surface.

It also comes with a visualization script (https://github.com/shreyashampali/ho3d) that can render the annotations as 3D meshes (using Open3D) or 2D projects of on object corners and hand points (using Matplotlib):

What I am trying to do is project the visualization created by Open3D back to the original image.

So far I have not been able to do this. What I have been able to do is get the point cloud from 3d mesh and apply the camera intrinsic on it to make it perspective correct, now the question is how to create a mask out of the point-cloud for both hands and objects like the one from Open3d rendering.

# code looks as follows
# "mesh" is an Open3D triangle mesh ie "open3d.geometry.TriangleMesh()" 
pcd = open3d.geometry.PointCloud()
pcd.points = mesh.vertices
pcd.colors = mesh.vertex_colors
pcd.normals = mesh.vertex_normals

pts3D = np.asarray(pcd.points)
# hand/object along negative z-axis so need to correct perspective when plotting using OpenCV
cord_change_mat = np.array([[1., 0., 0.], [0, -1., 0.], [0., 0., -1.]], dtype=np.float32)
pts3D = pts3D.dot(cord_change_mat.T)

# "anno['camMat']" is camera intrinsic matrix 
img_points, _ = cv2.projectPoints(pts3D, (0, 0, 0), (0, 0, 0), anno['camMat'], np.zeros(4, dtype='float32'))

# draw perspective correct point cloud back on the image
for point in img_points:
    p1, p2 = int(point[0][0]), int(point[0][1])
    img[p2, p1] = (255, 255, 255)

Basically, I'm trying to get this segmentation mask out:

PS. Sorry if this doesn't make much sense, I'm very much new to 3D meshes, point clouds and their projections. I don't know all the correct technical words from them, yet. Leave a comment with a question and I can try to explain it as far as I can.

Solution

Turns out there is an easy way to do this task using Open3D and the camera intrinsic values. Basically we instruct Open3D to render the image from the POV of the camera.


import open3d
import open3d.visualization.rendering as rendering

# Create a renderer with a set image width and height
render = rendering.OffscreenRenderer(img_width, img_height)

# setup camera intrinsic values
pinhole = open3d.camera.PinholeCameraIntrinsic(img_width, img_height, fx, fy, cx, cy)
    
# Pick a background colour of the rendered image, I set it as black (default is light gray)
render.scene.set_background([0.0, 0.0, 0.0, 1.0])  # RGBA

# now create your mesh
mesh = open3d.geometry.TriangleMesh()
# define further mesh properties, shape, vertices etc  (omitted here)  
mesh.paint_uniform_color([1.0, 0.0, 0.0]) # set Red color for mesh 

# Define a simple unlit Material.
# (The base color does not replace the mesh's own colors.)
mtl = open3d.visualization.rendering.Material()
mtl.base_color = [1.0, 1.0, 1.0, 1.0]  # RGBA
mtl.shader = "defaultUnlit"

# add mesh to the scene
render.scene.add_geometry("MyMeshModel", mesh, mtl)

# render the scene with respect to the camera
render.scene.camera.set_projection(camMat, 0.1, 1.0, 640, 480)
img_o3d = render.render_to_image()

# we can now save the rendered image right at this point 
open3d.io.write_image("output.png", img_o3d, 9)


# Optionally, we can convert the image to OpenCV format and play around.
# For my use case I mapped it onto the original image to check quality of 
# segmentations and to create masks.
# (Note: OpenCV expects the color in BGR format, so swap red and blue.)
img_cv2 = cv2.cvtColor(np.array(img_o3d), cv2.COLOR_RGBA2BGR)
cv2.imwrite("cv_output.png", img_cv2)

This answer borrows a lot from this answer