opencv computer-vision camera-calibration

Why minimize squares of re-projection error during camera calibration?

I am working with the cameraCalibration function of openCV and that works just fine. However, I'm having trouble understanding why it uses it's particular cost function

sqrt( 1/n *  sum( d(xi', xi)**2 ,i , 1, n))

where xi' are the re-projected (or model) coordinates and xi the raw image coordinates (see for instance this Question). Intuitively, I would write down the cost function as

1/n sum( d(xi', xi) , i, 1, n)

In other words, as the mean of the euclidean distances of the points.

I understand that these expressions are different quantiatively. What I'm interesetested in is what is the qualitative difference between the prefered solutions of the two cost functions and why is the former used in camera calibration?

Solution

The first quantity would be the RMS of re-projection error vector lengths. For the sake of optimisation, the sum of squares, SSE, has the same extrema:

sum( d(xi', xi)**2 ) = sum( dot(xi' - xi, xi' - xi) )

Your alternative is the sum (or equivalently the mean) of error lengths:

sum( d(xi', xi) ) = sum( sqrt( dot(xi' - xi, xi' - xi) ) )

A third alternative would be the sum of absolute deviations:

sum( abs(xi'_x - xi_x) + abs(xi'_y - xi_y) )

So the question boils down to: why do we prefer the least-squares solution?

The main reason is that, if errors are zero-mean, independent and normally distributed (sufficient but not necessary), the least-squares solution is the max-likelihood estimate - aka the solution (out of all possible solutions) that makes the observed data most probable.

A second reason is that the least-squares formulation allows for nice and relatively simple mathematical derivations (see LM algorithm). With the sum of lengths, you would have to take derivatives of an expression involving square roots. With the least absolute deviations (LAD), you would need derivatives of the absolute function and local update steps may not be unique.

As a side note: the assumption of normally distributed errors is fair but if you analyse the residual distribution after camera calibration, you might find it violated if the calibration was not done with great care. For other types of error distribution, the LAD is in fact the max-likelihood estimator.

There is many more details on the calibration process in this article: https://calib.io/blogs/knowledge-base/camera-calibration