Search code examples
ffmpeglibavcodeclibav

FFMPEG / libav: How is UYVY422 written inside AVFrame structure?


I'm trying to copy frame data from AVFrame structure to a buffer. I know how to do that with YUV420P format since Y data is stored inside AVFrame frame->data[0], U data is stored inside AVFrame frame->data[1] and V data is stored inside AVFrame frame->data[2], so it was easy to memcpy() Y,U and V data separately + it's planar format so i was able to do that with ease:

for (y = 0; y < height; y++)
    {
        memcpy(buffer + y*frame->linesize[0], frame->data[0] + y*frame->linesize[0], width);
    }

    buffer += ySize;

    for (y = 0; y < height / 2; y++)
    {
        memcpy(buffer + y*frame->linesize[1], frame->data[1] + y*frame->linesize[1], width / 2);
    }

    buffer += uSize;

    for (y = 0; y < height / 2; y++)
    {
        memcpy(buffer + y*frame->linesize[2], frame->data[2] + y*frame->linesize[2], width / 2);
    }

But when it comes to UYVY422 i have no idea how the data is stored inside the structure. I have general knowledge about UYVY422 format and that it's written like it name suggests UYVYUYVYUYVY... and so on. But my question is how do i know how much data is stored in AVFrame frame->data[0], AVFrame frame->data[1] and AVFrame frame->data[2] field so i can memcpy() exact amount to the buffer?


Solution

  • For UYVY, the data is stored exclusively in frame->data[0], and per line you should copy width * 2 bytes:

    for (y = 0; y < height; y++)
    {
        memcpy(output_buffer + y*frame->linesize[0],
               frame->data[0] + y*frame->linesize[0], width * 2);
    }
    

    There's a way to programmatically derive this, in case you're interested. Each AVPixelFormat has a AVPixFmtDescriptor that describes its packing in AVFrame->data[]. To get yours, use av_pix_fmt_desc_get(AV_PIX_FMT_UYVY). The returned item is this one (see struct reference for AVComponentDescriptor here). You'll see that desc->nb_components is 3, desc->log2_chroma_w is 1, which means U/V are subsampled by 1 horizontally, and desc->comp[0-2].plane is 0, which means all data is in AVFrame->data[0]. The offset/step/depth in desc->comp[0-2] tell you the rest in case you want a fully dynamic way of reading any pix_fmt. I don't think you personally need it, but at the very least it allows anyone to derive the packing of any pix_fmt in AVFrame->data[].

    [edit] See following example code (possibly buggy):

    #include <assert.h>
    #include <stdio.h>
    #include <libavutil/pixdesc.h>
    
    int main(int argc, char *argv[]) {
        if (argc < 2) {
            fprintf(stderr, "Usage: %s [fmt]\n", argv[0]);
            return 1;
        }
        const char *fmtname = argv[1];
        enum AVPixelFormat fmt = av_get_pix_fmt(fmtname);
        if (fmt == AV_PIX_FMT_NONE) {
            fprintf(stderr, "Unknown pixfmt %s\n", fmtname);
            return 1;
        }
        const AVPixFmtDescriptor *desc = av_pix_fmt_desc_get(fmt);
        assert(desc != NULL);
        printf("N planes: %d, %d bits/element\n", desc->nb_components, desc->comp[0].depth);
    
        int n;
        int epl[4] = { 0, 0, 0, 0 };
        int width = 0x100;
        for (n = 0; n < desc->nb_components; n++) {
            int is_y = n == 0;
            int is_a = !(desc->nb_components & 1) && n == desc->nb_components - 1;
            int h_ss = (is_y || is_a) ? 0 : desc->log2_chroma_w;
    
            epl[desc->comp[n].plane] += width >> h_ss;
        }
    
        for (n = 0; n < 4; n++) {
            int is_y = n == 0;
            int is_a = !(desc->nb_components & 1) && n == desc->nb_components - 1;
            int v_ss = (is_y || is_a) ? 0 : desc->log2_chroma_h;
    
            if (epl[n] == 0) continue;
            printf("Plane %d has %lf elements/y_pixel (horizontally) and %lf lines/y_pixel (vertically)\n",
                   n, epl[n] / (double) width, (width >> v_ss) / (double) width);
        }
    
        return 0;
    }
    

    Which gives the following output:

    $ for fmt in yuyv422 uyvy422 yuv420p yuva420p10; do /tmp/test $fmt; done
    N planes: 3, 8 bits/element
    Plane 0 has 2.000000 elements/y_pixel (horizontally) and 1.000000 lines/y_pixel (vertically)
    N planes: 3, 8 bits/element
    Plane 0 has 2.000000 elements/y_pixel (horizontally) and 1.000000 lines/y_pixel (vertically)
    N planes: 3, 8 bits/element
    Plane 0 has 1.000000 elements/y_pixel (horizontally) and 1.000000 lines/y_pixel (vertically)
    Plane 1 has 0.500000 elements/y_pixel (horizontally) and 0.500000 lines/y_pixel (vertically)
    Plane 2 has 0.500000 elements/y_pixel (horizontally) and 0.500000 lines/y_pixel (vertically)
    N planes: 4, 10 bits/element
    Plane 0 has 1.000000 elements/y_pixel (horizontally) and 1.000000 lines/y_pixel (vertically)
    Plane 1 has 0.500000 elements/y_pixel (horizontally) and 0.500000 lines/y_pixel (vertically)
    Plane 2 has 0.500000 elements/y_pixel (horizontally) and 0.500000 lines/y_pixel (vertically)
    Plane 3 has 1.000000 elements/y_pixel (horizontally) and 1.000000 lines/y_pixel (vertically)