Search code examples
ffmpegyuvlibavswscale

Turn off sw_scale conversion to planar YUV 32 byte alignment requirements


I am experiencing artifacts on the right edge of scaled and converted images when converting into planar YUV pixel formats with sw_scale. I am reasonably sure (although I can not find it anywhere in the documentation) that this is because sw_scale is using an optimization for 32 byte aligned lines, in the destination. However I would like to turn this off because I am using sw_scale for image composition, so even though the destination lines may be 32 byte aligned, the output image may not be.

Example.

Full output frame is 1280x720 yuv422p10le. (this is 32 byte aligned) However into the top left corner I am scaling an image with an outwidth of 1280 / 3 = 426. 426 in this format is not 32 byte aligned, but I believe sw_scale sees that the output linesize is 32 byte aligned and overwrites the width of 426 putting garbage in the next 22 bytes of data thinking this is simply padding when in my case this is displayable area.

This is why I need to actually disable this optimization or somehow trick sw_scale into believing it does not apply while keeping intact the way the program works, which is otherwise fine.

I have tried adding extra padding to the destination lines so they are no longer 32 byte aligned, this did not help as far as I can tell.

Edit with code Example. Rendering omitted for ease of use. Also here is a similar issue, unfortunately as I stated there fix will not work for my use case. https://github.com/obsproject/obs-studio/pull/2836

Use the commented line of code to swap between a output width which is and isnt 32 byte aligned.

#include "libswscale/swscale.h"
#include "libavutil/imgutils.h"
#include "libavutil/pixelutils.h"
#include "libavutil/pixfmt.h"
#include "libavutil/pixdesc.h"
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {

/// Set up a 1280x720 window, and an item with 1/3 width and height of the window.
int window_width, window_height, item_width, item_height;
window_width = 1280;
window_height = 720;
item_width = (window_width / 3);
item_height = (window_height / 3);

int item_out_width = item_width;
/// This line sets the item width to be 32 byte aligned uncomment to see uncorrupted results
/// Note %16 because outformat is 2 bytes per component
//item_out_width -= (item_width % 16);

enum AVPixelFormat outformat = AV_PIX_FMT_YUV422P10LE;
enum AVPixelFormat informat = AV_PIX_FMT_UYVY422;
int window_lines[4] = {0};
av_image_fill_linesizes(window_lines, outformat, window_width);

uint8_t *window_planes[4] = {0};
window_planes[0] = calloc(1, window_lines[0] * window_height);
window_planes[1] = calloc(1, window_lines[1] * window_height);
window_planes[2] = calloc(1, window_lines[2] * window_height); /// Fill the window with all 0s, this is green in yuv.


int item_lines[4] = {0};
av_image_fill_linesizes(item_lines, informat, item_width);

uint8_t *item_planes[4] = {0};
item_planes[0] = malloc(item_lines[0] * item_height);
memset(item_planes[0], 100, item_lines[0] * item_height);

struct SwsContext *ctx;
ctx = sws_getContext(item_width, item_height, informat,
               item_out_width, item_height, outformat, SWS_FAST_BILINEAR, NULL, NULL, NULL);

/// Check a block in the normal region
printf("Pre scale normal region %d %d %d\n", (int)((uint16_t*)window_planes[0])[0], (int)((uint16_t*)window_planes[1])[0],
       (int)((uint16_t*)window_planes[2])[0]);

/// Check a block in the corrupted region (should be all zeros) These values should be out of the converted region
int corrupt_offset_y = (item_out_width + 3) * 2; ///(item_width + 3) * 2 bytes per component Y PLANE
int corrupt_offset_uv = (item_out_width + 3); ///(item_width + 3) * (2 bytes per component rshift 1 for horiz scaling) U and V PLANES

printf("Pre scale corrupted region %d %d %d\n", (int)(*((uint16_t*)(window_planes[0] + corrupt_offset_y))),
       (int)(*((uint16_t*)(window_planes[1] + corrupt_offset_uv))), (int)(*((uint16_t*)(window_planes[2] + corrupt_offset_uv))));
sws_scale(ctx, (const uint8_t**)item_planes, item_lines, 0, item_height,window_planes, window_lines);

/// Preform same tests after scaling
printf("Post scale normal region %d %d %d\n", (int)((uint16_t*)window_planes[0])[0], (int)((uint16_t*)window_planes[1])[0],
       (int)((uint16_t*)window_planes[2])[0]);
printf("Post scale corrupted region %d %d %d\n", (int)(*((uint16_t*)(window_planes[0] + corrupt_offset_y))),
       (int)(*((uint16_t*)(window_planes[1] + corrupt_offset_uv))), (int)(*((uint16_t*)(window_planes[2] + corrupt_offset_uv))));

return 0;

}

Example Output:

//No alignment
Pre scale normal region 0 0 0
Pre scale corrupted region 0 0 0
Post scale normal region 400 400 400
Post scale corrupted region 512 36865 36865

//With alignment
Pre scale normal region 0 0 0
Pre scale corrupted region 0 0 0
Post scale normal region 400 400 400
Post scale corrupted region 0 0 0

Solution

  • I believe sw_scale sees that the output linesize is 32 byte aligned and overwrites the width of 426 putting garbage in the next 22 bytes of data thinking this is simply padding when in my case this is displayable area.

    That's actually correct, swscale indeed does that, good analysis. There's two ways to get rid of this:

    • disable all SIMD code using av_set_cpu_flags_mask(0).
    • write the re-scaled 426xN image in a temporary buffer and then manually copy the pixels into the unpadded destination plane.

    The reason ffmpeg/swscale overwrite the destination is for performance. If you don't care about runtime and want the simplest code, use the first solution. If you do want performance and don't mind slightly more complicated code, use the second solution.