Search code examples
qtqtmultimedia

Meaning of QVideoFrame::planeCount() and QVideoFrame::bytesPerLine


QVideoFrame::planeCount

Returns the number of planes in the video frame. This value is only valid while the frame data is mapped

QVideoFrame::bytesPerLine

Returns the number of bytes in a scan line.

I don't understand these statements very well, can someone please explain in detail?

From my webcam I get below data, after using QVideoFrame::map

width=640
height=480
pixelFormat=Format_RGB32
bytesPerLine=2560
mappedBytes=122880
planeCount=1

Solution

  • A scan line is one row of the video image. (The term “scan line” was coined in a time where it was common technique that a cathod ray scanned the screen tube surface line by line – Scan line.)

    If bytesPerLine is divided by width (bytesPerLine / width = 2560 / 640 = 4), it yields the bytes per pixel.

    If the calculated bytes per pixel is not an integral value, then the line is aligned to a certain multiple (usually 4). A convoluted example to illustrate this:

    width=458
    pixelFormat=Format_RGB24
    bytesPerLine=1376 (a multiple of 4)
    

    1376 / 458 = 3.0043668122270742358078602620087 (according to my Windows Calculator)

    458 * 3 = 1374 → There are 2 bytes to fill up the row to the next multiple of 4 (bytes unused, should be ignored).

    In OP's example, this is not an issue – the pixel format has itself a size which matches the row alignment.

    So, the 4 bytes matches the pixelFormat which states Format_RGB32 meaning:

    The frame stored using a 32-bit RGB format (0xffRRGGBB). This is equivalent to QImage::Format_RGB32.

    as 32 bits = 4 Bytes.

    The number of planes is a bit more tricky. I googled myself and found Single- and multi-planar APIs:

    Some devices require data for each input or output video frame to be placed in discontiguous memory buffers. In such cases, one video frame has to be addressed using more than one memory address, i.e. one pointer per “plane”. A plane is a sub-buffer of the current frame.

    In the case of OP, there is 1 plane – the color components of a pixel are stored consecutively (packed).

    Another example, where number of planes is > 1: QVideoFrame::Format_YUV420P

    The frame is stored using an 8-bit per component planar YUV format with the U and V planes horizontally and vertically sub-sampled, i.e. the height and width of the U and V planes are half that of the Y plane.

    This means that color components of a pixel are not packed together (like in Format_RGB32) but each plane stores only one color component of one pixel. To reconstruct the pixel, the corresponding color components have to be read from every plane and combined respectively.

    In the case, where there are multiple planes, the 2nd flavor of bytesPerLine() should be used:

    int QVideoFrame::bytesPerLine(int plane) const

    Returns the number of bytes in a scan line of a plane.

    In the case of the above cited QVideoFrame::Format_YUV420P: U and V component are horizontally and vertically sub-sampled. I.e. to get a pixel at (x, y) = (123, 72) the components have to be read:

    • Y component at offset 72 * getBytesPerLine(0) + 123
    • U component at offset 72 / 2 * getBytesPerLine(1) + 123 / 2
    • V component at offset 72 / 2 * getBytesPerLine(2) + 123 / 2.