iOS11 ARKit: Can ARKit also capture the Texture of the user's face?

I read the whole documentation on all ARKit classes up and down. I don't see any place that describes ability to actually get the user face's Texture.

ARFaceAnchor contains the ARFaceGeometry (topology and geometry comprised of vertices) and the BlendShapeLocation array (coordinates allowing manipulations of individual facial traits by manipulating geometric math on the user face's vertices).

But where can I get the actual Texture of the user's face. For example: the actual skin tone / color / texture, facial hair, other unique traits, such as scars or birth marks? Or is this not possible at all?

Solution

You want a texture-map-style image for the face? There’s no API that gets you exactly that, but all the information you need is there:

ARFrame.capturedImage gets you the camera image.
ARFaceGeometry gets you a 3D mesh of the face.
ARAnchor and ARCamera together tell you where the face is in relation to the camera, and how the camera relates to the image pixels.

So it’s entirely possible to texture the face model using the current video frame image. For each vertex in the mesh...

Convert the vertex position from model space to camera space (use the anchor’s transform)
Multiply with the camera projection with that vector to get to normalized image coordinates
Divide by image width/height to get pixel coordinates

This gets you texture coordinates for each vertex, which you can then use to texture the mesh using the camera image. You could do this math either all at once to replace the texture coordinate buffer ARFaceGeometry provides, or do it in shader code on the GPU during rendering. (If you’re rendering using SceneKit / ARSCNView you can probably do this in a shader modifier for the geometry entry point.)

If instead you want to know for each pixel in the camera image what part of the face geometry it corresponds to, it’s a bit harder. You can’t just reverse the above math because you’re missing a depth value for each pixel... but if you don’t need to map every pixel, SceneKit hit testing is an easy way to get geometry for individual pixels.

If what you’re actually asking for is landmark recognition — e.g. where in the camera image are the eyes, nose, beard, etc — there’s no API in ARKit for that. The Vision framework might help.