Search code examples
pythonffmpegrembg

ffmpeg - stretched pixel issue


Context

I'm converting a PNG sequence into a video using FFMPEG. The images are semi-transparent portraits where the background has been removed digitally.

Issue

The edge pixels of the subject are stretched all the way to the frame border, creating a fully opaque video.

Cause Analysis

The process worked fine in the previous workflow using rembg from command line however, since I started using rembg via python script using alpha_matting to obtain higher quality results, the resulting video has these issues.

The issue is present in both webm format (target) and mp4 (used for testing).

Command Used

Command used for webm is:

ffmpeg -thread_queue_size 64 -framerate 30 -i <png sequence location> -c:v libvpx -b:v 0 -crf 18 -pix_fmt yuva420p -auto-alt-ref 0 -c:a libvorbis <png output>

Throubleshooting Steps Taken

  1. PNG Visual inspection The PNG images have a fully transparent background as desired.
  2. PNG Alpha Measurement I have created a couple of python scripts to look at alpha level in pixels and confirmed that there is no subtle alpha level in the background pixels
  3. Exported MP4 with AE Using the native AE renderer the resulting MP4/H.265 has a black background, so not showing the stretched pixel issue

Image of the Issue

Text

Sample PNG Image from sequence Text

Code Context

rembg call via API using alpha_matting seems to generate a premultiplied alpha which uses non black pixels for 0 alpha pixels.

remove(input_data, alpha_matting=True, alpha_matting_foreground_threshold=250,
                    alpha_matting_background_threshold=250, alpha_matting_erode_size=12)

A test using a rough RGB reset of 0-alpha pixels confirms that the images are being played with their RGB value ignoring Alpha.

def reset_alpha_pixels(img):
    # Open the image file
    # Process each pixel
    data = list(img.getdata())
    new_data = []
    for item in data:
        if item[3] == 0:
            new_data.append((0, 0, 0, 0))
        else:
            new_data.append((item[0], item[1], item[2], item[3]))
        # Replace the alpha value but keep the RGB
        

    # Update the image data
    img.putdata(new_data)

    return img

Updates

  • Added python context to make the question more relevant within SO scope.

Solution

  • The issue is related to the video player.
    Most video players doesn't support transparency, and ignores the alpha (transparency) channel.
    The video player displays the rgb content of the background even if the background is supposed to be hidden (background pixels are fully according to their alpha value). Apparently, rembg output background is not filled with solid black or white, but having the stretched effect.

    When opening the PNG image, and when video in Chrome browser for example, the background is transparent (RGB values are hidden), and we can't see the "stretched effect".


    Solving the issue using FFMPEG is challenging.
    We better fix the issue in the Python code after applying rembg.

    For fixing the issue, me may select a solid background color like (200, 200, 200) gray background, and apply alpha compositing between RGB channels and the background.

    • Extract RGB channels:
        foreground_rgb = image_after_rembg[:, :, 0:3]  # Extract RGB channels.
    
    • Extract alpha (transparency) channel and convert from range [0, 255] to [0, 1]:
        alpha = image_after_rembg[:, :, 3].astype(np.float32) / 255  # Extract alpha (transparency) channel and convert from range [0, 255] to [0, 1].
        alpha = alpha[..., np.newaxis]  # Add axis - new alpha shape is (1024, 1024, 1). We need it for scaling 3D rgb by 2D alpha channel.
    
    • Set background RGB color to light gray color (for example):
        background_rgb = np.full_like(foreground_rgb, (200, 200, 200))  # Set background RGB color to light gray color (for example).
    
    • Apply "alpha compositing" of rgb and background_rgb:
        composed_rgb = foreground_rgb.astype(np.float32) * alpha + background_rgb.astype(np.float32) * (1 -alpha)
        composed_rgb = composed_rgb.round().astype(np.uint8)  # Convert to uint8 with rounding.
    
    • Add the original alpha channel to composed_rgb:
        composed_rgba = np.dstack((composed_rgb, alpha_ch))
    

    Complete Python code sample:

    from PIL import Image
    import numpy as np
    #from rembg import remove
    
    #image_file_before_rembg = 'input.png'
    image_file_after_rembg = 'frame-00001.png'
    
    # Assume code for removing background looks as follows:
    #image_before_rembg = Image.open(image_file_before_rembg)
    #image_after_rembg = remove(image_before_rembg)
    #image_after_rembg.save(image_file_after_rembg)
    
    image_after_rembg = Image.open(image_file_after_rembg)  # Skip background removing, and read the result from a file.
    image_after_rembg = np.array(image_after_rembg)  # Convert PIL to NumPy array.
    
    foreground_rgb = image_after_rembg[:, :, 0:3]  # Extract RGB channels.
    alpha_ch = image_after_rembg[:, :, 3]  # Extract alpha (transparency) channel
    alpha = alpha_ch.astype(np.float32) / 255  # Convert alpha from range [0, 255] to [0, 1].
    alpha = alpha[..., np.newaxis]  # Add axis - new alpha shape is (1024, 1024, 1). We need it for scaling 3D rgb by 2D alpha channel.
    
    background_rgb = np.full_like(foreground_rgb, (200, 200, 200))  # Set background RGB color to light gray color (for example).
    
    # Apply "alpha compositing" of rgb and background_rgb
    composed_rgb = foreground_rgb.astype(np.float32) * alpha + background_rgb.astype(np.float32) * (1 -alpha)
    
    composed_rgb = composed_rgb.round().astype(np.uint8)  # Convert to uint8 with rounding.
    
    composed_rgba = np.dstack((composed_rgb, alpha_ch))  # Add the original alpha channel to composed_rgb
    
    
    Image.fromarray(composed_rgba).save('new_frame-00001.png')  # Save the RGBA output image to PNG file
    

    Executing FFmpeg:

    ffmpeg -y -framerate 30 -loop 1 -t 5 -i new_frame-00001.png -vf "format=rgba" -c:v libvpx -crf 18 -pix_fmt yuva420p -auto-alt-ref 0 out.webm
    

    When playing with Chrome browser, the background is transparent.

    When playing with VLC Player, the background is light gray:

    enter image description here


    Using FFmpeg CLI, we have to use alphaextract, overlay and alphamerge filters.

    Example (5 seconds at 3fps for testing):

    ffmpeg -y -framerate 3 -loop 1 -i frame-00001.png -filter_complex "color=white:r=3[bg];[0:v]format=rgba,split=2[va][vb];[vb]alphaextract[alpha];[bg][va]scale2ref[bg0][v0];[bg0][v0]overlay=shortest=1,format=rgb24[rgb];[rgb][alpha]alphamerge" -c:v libvpx -crf 18 -pix_fmt yuva420p -auto-alt-ref 0 -t 5 out.webm