computer-vision 3d-reconstruction structure-from-motion

3D Reconstruction and SfM Camera Intrinsic Parameters

I am trying to understand the basic principles of 3D reconstruction, and have chosen to play around with OpenMVG. However, I have seen evidence that the following concepts I'm asking about apply to all/most SfM/MVS tools, not just OpenMVG. As such, I suspect any Computer Vision engineer should be able to answer these questions, even if they have no direct OpenMVG experience.

I'm trying to fully understand intrinsic camera parameters, or as they seem to be called, "camera instrinsics", or "intrinsic parameters". According to OpenMVG's documentation, camera intrinsics depend on the type of camera that is used to take the pictures (e.g., the camera model), of which, OpenMVG supports five models:

Pinhole: 3 intrinsic parameters (focal, principal point x, principal point y)
Pinhole Radial 1: 4 intrinsic params (focal, principal point x, principal point y, one radial distortion factor)
Pinhole Radial 3: 6 params (focal, principal point x, principal point y, 3 radial distortion factors)
Pinhole Brown: 8 params (focal, principal point x, principal point y, 5 distortion factors (3radial+2 tangential))
Pinhole w/ Fish-Eye Distortion: 7 params (focal, principal point x, principal point y, 4 distortion factors)

This is all explained on their wiki page that explains their camera model, which is the subject of my question.

On that page there are several core concepts that I need clarification on:

focal plane: What it is and how does it differ from the image plane (as shown in the diagram at the top of that page)?
focal distance/length: What is it?
principal point: What is it, and why should it ideally be the center of the image?
scale factor: Is this just an estimate of how far the camera is from the image plane?
distortion: What is it and what's the difference between its various subtypes:
- radial
- tangential
- fish-eye

Thanks in advance for any clarification/correction here!

Solution

I am unsure about the focal plane, so I will come back to it after I write about the other concepts you mention. Suppose you have a pinhole camera model with rectangular pixels, and let P=[X Y Z]^T be a point in camera space, with ^T denoting the transpose. In that case (assuming Z is the camera axis), this point can be projected as p=KP where K (the calibration matrix) is

f_x  0   c_x
0   f_y  c_y
0    0    1

(of course, you will want to divide p by its third coordinate after that).

The focal length, that I will note f is the distance between the camera center and the image plane. The variables

f_x=s_x*f 
f_y=s_y*f

in the matrix above respectively express this value in terms of pixel width and height. The variables s_x and s_y are the scale factors that are mentioned on the page you cite. The scale factor is the ratio between the size (width or height) of pixels and the units that you use in camera space. So, for example, if your pixel widths are half the size of the units you use on the x axis of camera space, you will have s_x=2.

I have seen people use the term principal point to refer to different things. While some people define it as the intersection between the camera axis and the image plane (Wikipedia seems to do this), others define it as the point given by [c_x c_y]^T. For clarity's sake, let's separate the whole projection process:

The two terms on the right hand side of the equation do different things. The first one scales the point and puts it into the image plane. The second term (i.e. [c_x c_y 1]^T) shifts the result from the other term. So, [-c_x ,-c_y]^T is the center of the image's coordinate system.

As for the difference between tangential/radial distortion: usually when correcting distortion, we assume that the center of the image o remains undistorted. A pixel p will have "moved away" from its true position q under the effect of distortion. If that movement is along the vector q-o then the distortion is radial, but if that movement has a component in a different direction, it is said to (also) have tangential distortion.

As I said I'm a bit unsure about what the focal plane they show in their figure means, but I think the term usually refers to the plane on which the upside-down image would form in a physical pinhole camera. A point P on the image plane (expressed in world coordinates) would just be -P on the focal plane.