Under the assumption that the camera model is orthographic, how do orthographic and perspective camera models in structure from motion?
Also, how do these techniques differ from each other?
Say you have a static scene and moving camera (or equivalently, rigidly moving scene and static camera) and you want to reconstruct the scene geometry and camera motion from two or more images. The reconstruction usually based on obtaining point correspondences, that is you have some equations which ones should be solved for the points and camera motion.
The solution can be either based on nonlinear minimization or on various approximations. The camera can be approximated by orthographic or perspective projection. In the simplest SFM case the camera can be approximated by orthographic projection (or more generally by weak perspective projection), where the scene can be recovered up to scale. But translation perpendicular to image plane can never be recovered due to the properties of orthographic projection.
Newer SfM methods use perspective projection, because with orthographic projection we can’t recover all information. With full perspective projection we can recover for example the translation along optical axis. That is the geometry and full motion can be recovered up to global scale factor.