computer-vision linear-algebra structure-from-motion

How to calculate the relative scale from two fundamental matrices that share a camera view?

I understand that the scale represented by the fundamental matrix F_AB between two camera views A and B is only correct up to a scale (ie, you don't know if you're for example looking at small trees up close or larger trees farther away).

However, given three points a, b, c and two fundamental matrices F_AB and F_BC, it should be able to relate their relative scales. My first thought is to just choose two features that exist in all three views, and calculate their distance using both F_AB and F_BC, and divide them. Maybe averaging over all features that exist in all three views? Am I on the right track, or is there a better way to do this?

Solution

If you know the intrinsic parameters of the cameras, you can estimate the 3D points with triangulation. With known distance d(a, b) between points a and b, you can then estimate the scale factor s directly with s = d(a, b) / d(a', b'), where d(a', b') is the distance of the triangulated points. If arbitrary scale is acceptable, you can use the distance d(a', b') from the other pair as reference. For robustness, calculate the scale factor using points and use the average as the final scale factor.

If you have enough point correspondences, you can use bundle adjustment to improve the parameters further. Rotation and translation can be calculated from essential matrix, which in turn can be calculated from camera intrinsic matrices and the fundamental matrix.