What algorithm is used by ARKit or ARCore to detect planar surfaces?

I am working on creating simple AR applications and I played a bit with both ARCore and ARKit. I was impressed by the accuracy of the positioning of virtual objects in the real world, since the phones I played with do not have depth sensor nor multiple cameras, so I was wondering: what algorithm do these SDKs implement for plane detection?

Solution

Basically, ARCore uses a technique called Concurrent Odometry and Mapping (COM) and ARKit uses a technique called Visual-Inertial Odometry (VIO) to understand where the smartphone is relative to the real environment around it.

At first stage – Motion Tracking – each smartphone combines visual data coming from RGB camera sensor (at 60 fps) and motion data coming from accelerometer and gyroscope sensors (at 1000 fps) to compute a position of a high-contrast feature points using parallax formula. That information allows ARCore and ARKit get a position and orientation of a virtual camera in six degrees of freedom (6 DoF).

The next stage – Scene Understanding – helps ARCore and ARKit understand where several feature points (a.k.a. sparse point cloud) are coplanar – thus, letting a smartphone find out where a detected plane will be.

The last ARCore's stage is Light Estimation and the last ARKit's stage is Rendering (you don't even need to import SceneKit because ARKit contains all the necessary SceneKit's classes – check this out).

As you can see, there's no need for manufacturer to supply a smartphone with multiple RGB cameras or with Depth rear camera because just RGB and IMU sensors can do the job.