How best to access ARCore video frame in compute shader

I'm implementing a technology that uses ARCore (currently 1.16.0) as well as custom image processing in OpenGL ES 3.2 compute shaders.

There are two ways I can access the video frame from compute shaders, both with drawbacks:

Using Session.setCameraTextureName(). This seems like a natural option since I'm letting ARCore get the data into an OpenGL texture for me. However it's in RGBA, which presumably means there is unwanted lossy conversion happening before I get the data. Also, since it's a GL_TEXTURE_EXTERNAL_OES texture, I can't access pixels directly (using imageLoad()), and have to go via a sampler. Also, on the devices I've tested on, the texture resolution is 1080 for every CameraConfig, which is too much for my compute algorithm, necessitating downscaling.
Using Frame.acquireCameraImage(). The pros here are that I get the imagery in YCbCr, presumably skipping a lossy conversion step, and that I can select an appropriate resolution. The drawback is I need to make two glTexSubImage2D() calls.

(Anecdotally, on Samsung S8 and A3, the texture resolution for all configs is 1920x1080, and the image resolutions are 640x480, 1280x720 and 1920x1080.)

It seems like ARCore + compute is a bit of a grey area. Any suggestions as to which is the better option are welcome, but please cite sources. :)

EDIT: Adding some concrete questions based on feedback:

What path is generally used behind the scenes to go from YCbCr image in main memory to RGBA OpenGL ES texture?
Is this path faster than glTexSubImage2D()?
Am I paying the computational cost for this regardless of whether I'm setting the session's texture name?
Does the main-memory YCbCr image always come before the RGBA texture in the conversion chain?

Solution

What path is generally used behind the scenes to go from YCbCr image in main memory to RGBA OpenGL ES texture?

Using GL_TEXTURE_EXTERNAL_OES is basically zero-copy access. The GPU can directly access the YUV image written by the camera/video block. Color-conversion happens on the fly; there may be some GPU overhead in doing this - that's going to depend on the hardware - but there isn't a second RGB copy created in memory. How much overhead - modern mobile GPUs can do this very efficiently, but older ones might have 2x texturing cost for two plane YUV.

Is this path faster than glTexSubImage2D()?

In general, yes, that's why the direct access path exists. However, the fact you'll have to inject additional downsampling in one case, and not the other, doesn't make this an obvious "right answer".

Also note that the cost is different. Direct YUV-RGB conversion incurs GPU cost. Texture upload incurs CPU cost. Depending on the relative CPU/GPU performance of your device, and what resources the rest of your application is using, one of these might be less painful than the other.

Am I paying the computational cost for this regardless of whether I'm setting the session's texture name?

No. The YUV-to-RGB conversion is done in the sampler when the texture is accessed.

Does the main-memory YCbCr image always come before the RGBA texture in the conversion chain?

The RGBA texture is entirely virtual - it doesn't exist in memory.

the texture resolution for all configs is 1920x1080, and the image resolutions are 640x480, 1280x720 and 1920x1080.

Note that there is a lot of vendor-specific "magic" in the camera HAL layer. It's impossible to say how the lower resolution images are created; they could be generated at lower resolution directly by the camera, or they might be down-scaled in a separate pass by the vendor HAL (usually using a hardware scaler, if the camera can't do it directly).