Search code examples
videoffmpegvideo-processing

What is the structure of a video stream?


The end goal is to process the RGB data of a video.

I am trying to read the bytes of a file that I have created using ffmpeg.

ffmpeg -video_size 100x100 -framerate 20 -f x11grab -i :0.0 \
    -c:v rawvideo -pix_fmt rgb24 -video.nut

I wrote a node script to help make it easier to read the binary data if you need it. The output of my current file is:

Hex Binary   Row
47  01000111 0
40  01000000 1
11  00010001 2
10  00010000 3
03  00000011 4
00  00000000 5
00  00000000 6
00  00000000 7
68  01101000 8 

I see the spec for a .nut, but I can't figure it out. I would like to be able to parse out the RGB data for each frame so that I am left with a RGB matrix for each "image" in the video stream. Thanks!


Solution

  • If you're ok with parsing raw video then use

    ffmpeg -video_size 100x100 -framerate 20 -f x11grab -i :0.0
           -c:v rawvideo -pix_fmt rgb24 -f rawvideo video.rgb
    

    The structure of the output will be

    R G B R G B R G B ...
    

    So each three byte triplet represents a pixel in scan order Left to Right, Top to Bottom. So, for a 100x100 frame, the 301st byte is the R value for the first pixel of the 2nd row. The 30000th byte is the B value for the pixel at the bottom-right corner. Then the next 30K bytes represent the next frame and so on.

    This is a raw video stream, so no container metadata or frame encapsulation present. Just an undifferentiated stream of pixel channel values.