c#unity-game-engine augmented-reality arcore yuv

ARCore – How to convert YUV camera frame to RGB frame in C#?

I want to get the current Camera Image from the ARCore session. I am using the Frame.CameraImage.AcquireCameraImageBytes() method to get this image. Then i am loading this image to a Texture2D in format of TextureFormat.R8. But the texture is red and upside down. I know that the format ARCore using is YUV but i could not find a way to convert this format to RGB. How can i do this?

There are 2 or 3 questions about this issue but no solution is given.

Code is given below:

CameraImageBytes image = Frame.CameraImage.AcquireCameraImageBytes();
int width = image.Width;
int height = image.Height;
int size = width*height;


Texture2D texture = new Texture2D(width, height, TextureFormat.R8, false, false);
byte[] m_EdgeImage = new byte[size];

System.Runtime.InteropServices.Marshal.Copy(image.Y, m_EdgeImage, 0, size);

texture.LoadRawTextureData(m_EdgeImage);
texture.Apply();

Result image:

Solution

In the code you included, you are copying the Y channel of the camera texture (image.Y) to a single channel RGB texture (TextureFormat.R8), without doing any conversion.

YUV and RGB have three channels, but you're using only one. In RGB the channels usually have the same size, but in YUV they are often different. U and V can be a a fraction of the size of Y, the specific fraction depending on the format used.

Since this texture is coming from the Android camera, the specific format should be Y′UV420p, which is a planar format, see the Wikipedia page for a useful visual representation of how the channel values are grouped:

The CameraImageBytes API structure requires you to extract the channels separately, and then put them back together again programmatically.

FYI there is an easier way to get an already converted to RGB camera texture, but it can only be accessed through a shader, not C# code.

Assuming you still want to do this in C#, to gather all the channels from the YUV texture you need to treat the UV channels differently from the Y channel. You must create a separate buffer for the UV channels. There is an example of how to do this in an issue on the Unity-Technologies/experimental-ARInterface github repo:

//We expect 2 bytes per pixel, interleaved U/V, with 2x2 subsampling
bufferSize = imageBytes.Width * imageBytes.Height / 2;
cameraImage.uv = new byte[bufferSize];

//Because U an V planes are returned separately, while remote expects interleaved U/V
//same as ARKit, we merge the buffers ourselves
unsafe
{
    fixed (byte* uvPtr = cameraImage.uv)
    {
        byte* UV = uvPtr;

        byte* U = (byte*) imageBytes.U.ToPointer();
        byte* V = (byte*) imageBytes.V.ToPointer();

        for (int i = 0; i < bufferSize; i+= 2)
        {
            *UV++ = *U;
            *UV++ = *V;

            U += imageBytes.UVPixelStride;
            V += imageBytes.UVPixelStride;
        }
    }
}

This code will produce raw texture data that can be loaded into a Texture2D of format TextureFormat.RG16:

Texture2D texUVchannels = new Texture2D(imageBytes.Width / 2, imageBytes.Height / 2, TextureFormat.RG16, false, false);
texUVchannels.LoadRawTextureData(rawImageUV);
texUVchannels.Apply();

Now that you have all 3 channels in stored in 2 Texture2D, you can convert them either through a shader, or in C#.

The specific conversion formula to use for the Android camera YUV image can be found on the YUV wikipedia page:

void YUVImage::yuv2rgb(uint8_t yValue, uint8_t uValue, uint8_t vValue,
        uint8_t *r, uint8_t *g, uint8_t *b) const {
    int rTmp = yValue + (1.370705 * (vValue-128));
    int gTmp = yValue - (0.698001 * (vValue-128)) - (0.337633 * (uValue-128));
    int bTmp = yValue + (1.732446 * (uValue-128));
    *r = clamp(rTmp, 0, 255);
    *g = clamp(gTmp, 0, 255);
    *b = clamp(bTmp, 0, 255);
}

translated into a Unity shader that would be:

float3 YUVtoRGB(float3 c)
{
    float yVal = c.x;
    float uVal = c.y;
    float vVal = c.z;

    float r = yVal + 1.370705 * (vVal - 0.5);
    float g = yVal - 0.698001 * (vVal - 0.5) - (0.337633 * (uVal - 0.5));
    float b = yVal + 1.732446 * (uVal - 0.5);

    return float3(r, g, b);
}

The texture obtained this way is of a different size compared to the background video coming from ARCore, so if you want them to match on screen, you'll need to use the UVs and other data coming from Frame.CameraImage.

So to pass the UVs to the shader:

var uvQuad = Frame.CameraImage.ImageDisplayUvs;
mat.SetVector("_UvTopLeftRight",
              new Vector4(uvQuad.TopLeft.x, uvQuad.TopLeft.y, uvQuad.TopRight.x, uvQuad.TopRight.y));
mat.SetVector("_UvBottomLeftRight",
              new Vector4(uvQuad.BottomLeft.x, uvQuad.BottomLeft.y, uvQuad.BottomRight.x, uvQuad.BottomRight.y));

camera.projectionMatrix = Frame.CameraImage.GetCameraProjectionMatrix(camera.nearClipPlane, camera.farClipPlane);

and to use them in the shader you'll need to lerp them as in EdgeDetectionBackground shader.

In this same shader you'll find an example of how to access the RGB camera image from a shader, without having to do any conversion, which may turn out to be easier for your use case.

There are a few requirements for that:

the shader must be in glsl
it can only be done in OpenGL ES3
the GL_OES_EGL_image_external_essl3 extension needs to be supported