Search code examples
opencvffmpegvideo-streamingmp4video-compression

Compression Artifacts using sws_Scale() AVFrame YUV420p-> openCV Mat BGR24 and back


I transcode, using C++ and FFmpeg, an H264 video in an .mp4 container to H265 video in an .mp4 container. That works perfectly with crisp and clear images and encoding conversion confirmed by checking with FFprobe.

Then, I call one extra function in between the end of the H264 decoding and before the start of the H265 encoding. At that point I have an allocated AVFrame* that I pass to that function as an argument.

The function converts the AVFrame into an openCV cv::Mat and backwards. Technically that is the easy part, yet i encountered a compression artifact problem in the process of which i don't understand why it happens.

The function code (including a workaround for the question that follows) is as follows:

void modifyVideoFrame(AVFrame * frame)
{
    // STEP 1: WORKAROUND, overwriting AV_PIX_FMT_YUV420P BEFORE both sws_scale() functions below, solves "compression artifacts" problem;
    frame->format = AV_PIX_FMT_RGB24; 
        
    // STEP 2: Convert the FFmpeg AVFrame to an openCV cv::Mat (matrix) object.
    cv::Mat image(frame->height, frame->width, CV_8UC3);
    int clz = image.step1();

    SwsContext* context = sws_getContext(frame->width, frame->height, (AVPixelFormat)frame->format, frame->width, frame->height, AVPixelFormat::AV_PIX_FMT_BGR24, SWS_FAST_BILINEAR, NULL, NULL, NULL);
    sws_scale(context, frame->data, frame->linesize, 0, frame->height, &image.data, &clz);
    sws_freeContext(context);

    // STEP 3 : Change the pixels.
    if (false)
    {
        // TODO when "compression artifacts" problem with baseline YUV420p to BGR24 and back BGR24 to YUV420P is solved or explained and understood.
    }
    
    // UPDATE: Added VISUAL CHECK
    cv::imshow("Visual Check of Conversion AVFrame to cv:Map", image);
    cv::waitKey(20);

    // STEP 4: Convert the openCV Mat object back to the FFmpeg AVframe.
    clz = image.step1();
    context = sws_getContext(frame->width, frame->height, AVPixelFormat::AV_PIX_FMT_BGR24, frame->width, frame->height, (AVPixelFormat)frame->format, SWS_FAST_BILINEAR, NULL, NULL, NULL);
    sws_scale(context, &image.data, &clz, 0, frame->height, frame->data, frame->linesize);
    sws_freeContext(context);
}

The code as shown, including the workaround, works perfectly but is NOT understood.

Using FFprobe i established that the input pixel format is YUV420p which is indeed AV_PIX_FMT_YUV420p that is found in the frame format. If I convert it to BGR24 and back to YUV420p without the workaround in step 1, then i have slight compression artifacts but which are clearly visible when viewing with VLC. So there is a loss somewhere which is what I try to understand.

However, when I use the workaround in step 1 then I obtain the exact same output as if this extra function wasn't called (that is crisp and clear H265 without compression artifacts). To be sure that the conversion took place i modified the red value (inside the part of the code that now says if(false) ), and i can indeed see the changes when playing the H265 output file with VLC.

From that test it is clear that after the conversion of the input, data present in AVFrame, from YUV420P to cv::Map BGR24, all information and data needed to convert it back into the original YUV420P input data was available. Yet that is not what happens without the workaround, proven by the compression artifacts.

I used the first 17 seconds of the movie clip "Charge" encoded in H264 and available on the 'Blender' website.

Is there anyone that has some explanation or that can help me understand why the code WITHOUT the workaround does not nicely converts the input data forwards and then backwards back into the original input data.

This is what i see: enter image description here

compared to what i see with work-around OR (update) Visual Check section (cv::imshow) IF part 4 of code is remarked: enter image description here

These are the FFmpeg StreamingParams that i used on input:

copy_audio => 1
copy_video => 0
vid_codec => "libx265"
vid_video_codec_priv_key => "x265-params"
vid_codec_priv_value => "keyint=60:min-keyint=60:scenecut=0"

// Encoder output
x265 [info]: HEVC encoder version 3.5+98-753305aff
x265 [info]: build info [Windows][GCC 12.2.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-3.1 (Main tier)
x265 [info]: Thread pool 0 using 64 threads on numa nodes 0
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 1 / wpp(12 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : hex / 57 / 2 / 2
x265 [info]: Lookahead / bframes / badapt        : 15 / 4 / 0
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 3 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 2 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : ABR-2000 kbps / 0.60
x265 [info]: VBV/HRD buffer / max-rate / init    : 4000 / 2000 / 0.750
x265 [info]: tools: rd=2 psy-rd=2.00 rskip mode=1 signhide tmvp fast-intra
x265 [info]: tools: strong-intra-smoothing lslices=4 deblock sao

Solution

  • The conversion from the original video data to H264 YUV420P generates a small loss due to subsampling (4:2:0) 12bits per pixel, and the frame lines hold 3 planes (linesize, linesize/2, linesize/2).

    When I converted this to RGB24 or BGR24 for use with cv::Mat of openCV, the original is not restored but the original processed into YUV420 is restored. Hence the RGB/BGR starts already with a small loss, but in general a single step is hardly visible for the eye.

    Then when I convert the RGB24 back to YUV420, with or without processing, there is again a round of subsampling, but this time the subsampling starts with RGB/BGR that was already processed ones before and doesn't match the original. The output size remains the same (12bits per pixel) because the YUV420 to RGB did restore the original size (just not the original quality). When writing the new YUV420 to file in H265 format it passes the codec processing, which is different from H264 and which might explain why besides the compression artefacts the image got darker, as noticed and mentioned in a comment by Christoph Rackwitz.

    In total the degradation shown in the picture is the result.

    If one has no access to the original video data the solution is to create a new FFmpeg AVFrame to hold YUV444P. This will double the size of the file because it uses 24bit per pixel to actually hold the information of 12bits per pixel.

    Then the YUV444P, after optional processing such as applying overlays, can be offered to the H265 Codec. Any other subsampling 4:2:2, 4:1:1, 4:1:0, 3:1:1 and even 4:2:0 itself, results in extra losses on top of those of the original 4:2:0 conversion due to subsampling again and not starting from the original.

    If one has the original video data, then of course all processing can be done upfront before the first conversion/subsampling and then converted directly to the wanted pixel format and format (e.g. H265 YUV420 or anything else).

    So the problem of OP is not one of FFmpeg sws_scale() but one of me not thinking before starting to code :)