python opencv camera camera-calibration camera-matrix

Python calibrate camera

I have the following image I1. I did not capture it. I downloaded it from Google

I apply a known homography H to I1 to obtain the following image I2.

I want to assume that a camera has taken this above shot of I2. I do not know the camera matrix of this camera and I want to find it. To find this camera matrix mtx, I am using the OpenCV camera calibration method: ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None,flags=cv2.CALIB_FIX_ASPECT_RATIO|cv2.CALIB_FIX_K1|cv2.CALIB_FIX_K2|cv2.CALIB_FIX_K3|cv2.CALIB_FIX_K4|cv2.CALIB_FIX_K5)

This is done using a square and it's real world and image coordinates. I choose a square in image I1 and get the corresponding corner points of the square in I2 using homography H. Since I know that these corresponding points in I2 form a square, I should be able to get the camera matrix from these points. However, when I take the same square at a different location in the image, I get a different camera matrix. Why is this? What am I doing wrong and how I can I fix it? How do I calculate the correct camera matrix?

An example is shown below. For these two chosen squares, I get different values of mtx from the calibrateCamera function.

NOTE: The red points in the above images are not the corner points of a perfect square in I1. I have just roughly marked them to convey my point that when I take two squares of the same size but at different locations, I get different values for the camera matrix.

Solution

This is a good question which involves several important issues with calibration and computational geometry. I'm going to provide an in-depth answer that I hope will make these things clear.

When performing camera calibration there are three reasons why you can have different intrinsic matrices, if you repeat the calibration using different sets of correspondences.

The correspondences are noisy.
The camera calibration problem is under-determined. This means there is not enough correspondence information to resolve all camera parameters uniquely.
The camera calibration uses an imprecise or overly-restrictive camera model.

Reason 1 should be fairly obvious. If the correspondences are corrupted by measurement noise, then you will generally obtain different calibrations if you use different sets of correspondences.This is because during calibration thee is an optimization process where the camera parameters are optimized to best fit the correspondences. When there is noise, the best fit can vary depending on the measured noise.

Reason 2 happens if you try to calibrate using insufficient information. For example, if you only had three correspondences per image, the calibration problem is under determined. You can think of this through counting parameters. Three correspondences provides 6 constraints on the calibration equations (two for each correspondence through it's x and y position). Now, when we calibrate we must jointly estimate the pose of the calibration object (which has 6 degrees of freedom per image), plus the unknowns for the intrinsics (focal length, principal point, distortion etc.). There are therefore more unknowns than constraints, so there can be infinity many calibrations! If therefore you chose different sets of three correspondences, the returns calibration (if one is returned at all) will never be correct and generally never be the same.

Reason 3 is more subtle. To explain this, remember that calibration can be done by specifying a camera with different numbers of unknown intrinsic parameters. It is often good to reduce the number of unknowns in cases where you have very limited calibration information. For example, if calibrating with a single image, a planar calibration object will give you a maximal of 8 constraints per image on the calibration (because a homography has 8 degrees of freedom). 6 constraints are required to get the plane's pose, so we are left with 2 remaining constraints per image. If you only have a single image, you cannot do a calibration when there are more than 2 unknowns (e.g. focal lengths and lens distortion). Therefore if we wanted to calibrate using a single image we must reduce unknowns.

What's happening in your case In your case you have reduced the unknowns to a single focal length (f=fx=fy) and the camera's principal point. That's 3 unknowns, but recall that to do calibration with a single image means you can only have a maximum of 2 intrinsic unknowns. Therefore you have an under-constrained problem (see reason 2 above).

Now, you might decide to overcome this by fixing the principal point to the image centre, which is a common thing to do as it is often a good approximation for the real principal point. Now you have a calibration problem with 1 unknown intrinsic (f). The important question is, if we try to calibrate f using a single image and 4 noiseless correspondences, can we expect to get the same value using different sets of correspondences? You might think yes but the answer is no.

The reason is because the calibration process will be solving an over-constrained problem (8 constraints and 7 unknowns). It will generally solve this (as OpenCV's calibrateCamera method does) using a function minimization process. In OpenCV, it is done by minimizing the reprojection error. The solution to this will vary depending on the correspondences you provide. This is rather tricky to imagine, so consider a different problem, where you're trying to fit a straight line to points on a slightly curved line. The straight line is an overly simplified model for the data. If we try to fit the line to the curved data, by sampling two points from it, the best fitting solution will change depending on which points are sampled.

In your particular case you can eliminate problems 2 and 3 by using an intrinsic matrix with exactly 2 unknowns, by removing the flag to fix the aspect ratio, and by fixing the principal point to the image centre.