Search code examples

Interpreting the rotation matrix with respect to a defined World coordinates

In the image bellow, we see a defined world plane coordinate (X,Y,0) where Z=0. The camera as we can see is heading towards the defined world plane.

World reference point is located on the top left of the Grid (0,0,0). The distance between every two yellow point is 40 cm


I've calibrated my camera using the checkerboard and then used the built-in function cv2.solvePnP in order to estimate the rotation and translation vector of the camera with respect to my defined world coordinates. The results are as follows:

   tvec_cam= [[-5.47884374]
   rvec_cam= [[-0.02823308]
              [ 0.08623225]
              [ 0.01563199]]

According to the results, the (tx,ty,tz) seems to be right as the camera is located in the negative quarter of X,Y world-coordinates

However, i'm getting confused by interpreting the rotation vector.!

Does the resulted rotation vector say that the camera coordinates are almost aligned with the world coordinate axis (means almost no rotation!)?,

If yes how could this be true?, since according to OPENCV's camera coordinates, the Z-axis of the camera is pointing towards the scene (which means towards the world plane), the X-axis points towards the image write (which means opposite of X-world axis) and the Y-axis of the camera points towards the image bottom (which also means opposite of the Y-world axis)

Moreover, what is the unit of the tvec?

Note: I've illustrated the orientation of the defined world-coordinate axis according the the result of the translation vector (both tx and ty are negative)

the code i used for computing the rotation and translation vectors is shown below:

import cv2 as cv 
import numpy as np

WPoints = np.zeros((9*3,3), np.float64)
WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4

#load the rotation matrix [[4.38073915e+03 0.00000000e+00 1.00593352e+03]
                       #  [0.00000000e+00 4.37829226e+03 6.97020491e+02]
                     #    [0.00000000e+00 0.00000000e+00 1.00000000e+00]]
with np.load('parameters_cam1.npz') as X:
mtx, dist, _, _ = [X[i] for i in ('mtx','dist','rvecs','tvecs')]

ret,rvecs, tvecs = cv.solvePnP(WPoints, imPoints, mtx, dist)



The code for estimating the intrinsic is show below

import numpy as np
import cv2
import glob
import argparse
import pathlib
ap = argparse.ArgumentParser()
ap.add_argument("-p", "--path", required=True, help="path to images folder")
ap.add_argument("-e", "--file_extension", required=False, default=".jpg", 
help="extension of images")
args = vars(ap.parse_args())
path = args["path"] + "*" + args["file_extension"]
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)

# prepare object points, like (0,0,0), (0.03,0,0), (0.06,0,0) ...., 
objp = np.zeros((5*7,3), np.float32)
objp[:,:2] = np.mgrid[0:7,0:5].T.reshape(-1,2)*0.03

# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.

#images = glob.glob('left/*.jpg') #read a series of images

images = glob.glob(path)

path = 'foundContours'
#pathlib.Path(path).mkdir(parents=True, exist_ok=True) 

found = 0
for fname in images: 
  img = cv2.imread(fname) # Capture frame-by-frame
  gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
  # Find the chess board corners
  ret, corners = cv2.findChessboardCorners(gray, (7,5), None)
  #  print(corners)
  # If found, add object points, image points (after refining them)
  if ret == True:
    objpoints.append(objp)   # Certainly, every loop objp is the same, in 3D.
    corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
    # print(corners2)
    # Draw and display the corners
    img = cv2.drawChessboardCorners(img, (7,5), corners2, ret)
    found += 1
    cv2.imshow('img', img)
    # if you want to save images with detected corners 
    # uncomment following 2 lines and lines 5, 18 and 19
    image_name = path + '/calibresult' + str(found) + '.jpg'
    cv2.imwrite(image_name, img)

print("Number of images used for calibration: ", found)

# When everything done, release the capture
# cap.release()
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, 

#save parameters needed in undistortion

print ("Camera Matrix = |fx  0 cx|")
print ("                | 0 fy cy|")
print ("                | 0  0  1|")
print (mtx)
print('distortion coefficients=\n', dist)
print('rotation vector for each image=', *rvecs, sep = "\n")
print('translation vector for each image=', *tvecs, sep= "\n")

Hope you could help me understanding this

Thanks in Advance


  • First, tvec is a in Axis-angle representation (

    You can obtain the rotation matrix using cv2.Rodrigues(). For your data, I get almost the identity:

    [[ 0.99616253 -0.01682635  0.08588995]
     [ 0.01439347  0.99947963  0.02886672]
     [-0.08633098 -0.02751969  0.99588635]]

    Now, according to the directions of x and y in your picture, the z-axis points downwards (apply carefully the right-hand rule). This explains why the z-axis of the camera is almost aligned with the z-axis of your world reference frame.

    Edit: Digging a little bit further, from the code you posted:

    WPoints = np.zeros((9*3,3), np.float64)
    WPoints[:,:2] = np.mgrid[0:9,0:3].T.reshape(-1,2)*0.4

    The values for X and Y are all positive and increment to the right and to the bottom respectively, so you are indeed using the usual convention. You are actually using X and Y incrementing to the right and down respectively and what's wrong is only the arrows you drew in the picture.

    Edit Concerning the interpretation of the translation vector: in the OpenCV convention, the points in the local camera reference frame are obtained from the points in the world reference frame like this:

    |x_cam|          |x_world|
    |y_cam| = Rmat * |y_world| + tvec
    |z_cam|          |z_world|

    With this convention, tvec is the position of the world origin in the camera reference frame. What's more easily interpretable is the position of the camera origin in the world reference frame, which can be obtained as:

    cam_center = -(tvec * R_inv)

    Where R_inv is the inverse of the rotation matrix. Here the rotation matrix is almost the identity, so a quick approximation would be -tvec, which is (5.4, 3.1, -24.1).