Unity: Converting Texture2D to YUV420P using FFmpeg

I'm trying to create a game in Unity where each frame is rendered into a texture and then put together into a video using FFmpeg. The output created by FFmpeg should eventually be sent over the network to a client UI. However, I'm struggling mainly with the part where a frame is caught, and passed to an unsafe method as a byte array where it should be processed further by FFmpeg. The wrapper I'm using is FFmpeg.AutoGen.

The render to texture method:

private IEnumerator CaptureFrame()
{
    yield return new WaitForEndOfFrame();

    RenderTexture.active = rt;
    frame.ReadPixels(rect, 0, 0);
    frame.Apply();

    bytes = frame.GetRawTextureData();

    EncodeAndWrite(bytes, bytes.Length);
}

The unsafe encoding method so far:

private unsafe void EncodeAndWrite(byte[] bytes, int size)
{
    GCHandle pinned = GCHandle.Alloc(bytes, GCHandleType.Pinned);
    IntPtr address = pinned.AddrOfPinnedObject();

    sbyte** inData = (sbyte**)address;
    fixed(int* lineSize = new int[1])
    {
        lineSize[0] = 4 * textureWidth;
        // Convert RGBA to YUV420P
        ffmpeg.sws_scale(sws, inData, lineSize, 0, codecContext->width, inputFrame->extended_data, inputFrame->linesize);
    }

    inputFrame->pts = frameCounter++;

    if(ffmpeg.avcodec_send_frame(codecContext, inputFrame) < 0)
        throw new ApplicationException("Error sending a frame for encoding!");

    pkt = new AVPacket();
    fixed(AVPacket* packet = &pkt)
        ffmpeg.av_init_packet(packet);
    pkt.data = null;
    pkt.size = 0;

    pinned.Free();
    ...
}

sws_scale takes a sbyte** as the second parameter, therefore I'm trying to convert the input byte array to sbyte** by first pinning it with GCHandle and doing an explicit type conversion afterwards. I don't know if that's the correct way, though.

Moreover, the condition if(ffmpeg.avcodec_send_frame(codecContext, inputFrame) < 0) alwasy throws an ApplicationException, where I also really don't know why this happens. codecContext and inputFrame are my AVCodecContext and AVFrame objects, respectively, and the fields are defined as the following:

codecContext

codecContext = ffmpeg.avcodec_alloc_context3(codec);
codecContext->bit_rate = 400000;
codecContext->width = textureWidth;
codecContext->height = textureHeight;

AVRational timeBase = new AVRational();
timeBase.num = 1;
timeBase.den = (int)fps;
codecContext->time_base = timeBase;
videoAVStream->time_base = timeBase;

AVRational frameRate = new AVRational();
frameRate.num = (int)fps;
frameRate.den = 1;
codecContext->framerate = frameRate;

codecContext->gop_size = 10;
codecContext->max_b_frames = 1;
codecContext->pix_fmt = AVPixelFormat.AV_PIX_FMT_YUV420P;

inputFrame

inputFrame = ffmpeg.av_frame_alloc();
inputFrame->format = (int)codecContext->pix_fmt;
inputFrame->width = textureWidth;
inputFrame->height = textureHeight;
inputFrame->linesize[0] = inputFrame->width;

Any help in fixing the issue would be greatly appreciated :)

Solution

Check examples on here: https://github.com/FFmpeg/FFmpeg/tree/master/doc/examples

Especially scaling_video.c. In FFmpeg scaling and pixel format conversion is same operation (keep the size parameters same for just pixel format conversion).

These examples very easy to follow. Give it a try.