python opencv computer-vision homography

Reconstructing a flying object's 3D trajectory off a single 2D-video

I am trying to reconstruct the basketball's 3D trajectory, using solely the broadcast feed. To do this, I had to calculate the homography matrix, so in each frame I successfully tracked the ball, and 6 points which their location is known in the "real world" (4 on the court itself, and 2 on the backboard) as seen in the picture.

Using the laws of physics I've also approximated the z-coordinate of the ball in every frame.

Now I want to map the ball's location from the 2D pixel coordinates to the real world. The code I have right now(which is attached later), inputs the pixel location (u,v) and height (z) and outputs x,y,z location. It works well for points on the court (meaning z=0), however when I need to track something in the air (the ball), the results don't make sense. If anyone can help tell me what I need to do to get the mapping I would appreciate it a lot.

# Make empty list for ball's 3D location
ball_3d_location = []
# Fixed things 
size         = frame_list[0].shape
focal_length = size[1]
center       = (size[1]/2, size[0]/2)
camera_matrix= np.array(
                         [[focal_length, 0, center[0]],
                         [0, focal_length, center[1]],
                         [0, 0, 1]], dtype = "double"
                         )
def groundProjectPoint(image_point, z = 0.0):
    camMat = np.asarray(camera_matrix)
    iRot   = np.linalg.inv(rotMat)
    iCam   = np.linalg.inv(camMat)
uvPoint = np.ones((3, 1))

# Image point
uvPoint[0, 0] = image_point[0]
uvPoint[1, 0] = image_point[1]

tempMat = np.matmul(np.matmul(iRot, iCam), uvPoint)
tempMat2 = np.matmul(iRot, translation_vector)

s = (z + tempMat2[2, 0]) / tempMat[2, 0]
wcPoint = np.matmul(iRot, (np.matmul(s * iCam, uvPoint) - translation_vector))

# wcPoint[2] will not be exactly equal to z, but very close to it
assert int(abs(wcPoint[2] - z) * (10 ** 8)) == 0
wcPoint[2] = z

return wcPoint
dist_coeffs = np.zeros((4,1)) # Assuming no lens distortion

# The tracked points coordinates in the "Real World"
model_points = np.array([
(0,1524/2,0),         #Baseline-sideline
(0,-244,0),           #Paint-sideline
(579,-244,0),         #Paint-FT
(579,1524/2,0),       #Sideline-FT
(122,-182.9/2,396.32),#Top Left Backboard
(122,182.9/2,396.32)],dtype=np.float32 #Top Right BackBoard
)
for i,frame in enumerate(bball_frames):
  f             =frame
  #This array has the pixel coordinates of the court & backboard points
  image_points  =np.array([f.baseline_sideline,
  f.paint_sideline,
  f.paint_ft,
  f.sideline_ft,
  f.top_left_backboard,
  f.top_right_backboard],dtype=np.float32)

  (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix, dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)

  rotMat, _ = cv2.Rodrigues(rotation_vector)
  #We assume we know the ball's height in each frame due to the laws of physics.
  ball_3d_location+=[groundProjectPoint(image_point=ball_2d_location[i],z = ball_height[i])]

EDIT:

Solution

First, I want to clarify the planes of reference:

The video you have is a 2D projection (viewing plane) of the 3D world, as a plane perpendicular to the centerline of the camera lens.
The shot arc is embedded in a plane (shot plane) which is perpendicular to the real-world (3D) floor, defined by the point of release (shooter's hand) and point of contact (backboard).

The shot arc you see on the video is from the projection of that shot plane onto the viewing plane.

I want to make sure we're clear with respect to your most recent comment: So let's say I can estimate the shooting location on the court (x,y). using the laws of physics I can say where the ball is in each frame (x,y) wise and then from that and the pixel coordinates I can extract the height coordinate?

You can, indeed, estimate the (x,y) coordinate. However, I would not ascribe my approach to "the laws of physics". I would use analytic geometry.
You can estimate, with good accuracy, the 3D coordinates of both the release point (from the known (x, y, 0) position of the shooter's feet) and the end point on the backboard (whose corners are known).
Drop a perpendicular from each of these points to the floor (z=0). That line on the floor is the vertical projection of the arc to the floor -- these are the (x,y) coordinates of the ball in flight.
For each video frame, drop a projected perpendicular from the ball's image to that line on the floor ... that gives you the (x,y) coordinates of the ball, for what it's worth.
You have the definition (equation) of the view plane, the viewpoint (camera), and the arc plane. To determine the ball's position for each video frame, draw a line from the viewpoint, through the ball's image on the view plane. Determine the intersection of this line with the arc plane. That gives you the 3D coordinates of the ball in that frame.

Does that clarify a useful line of attack?