python ffmpeg raspberry-pi sdl-2 ffmpeg-python

Play a video with ffmpeg and SDL2 on a Raspberry Pi 5

I want to create a python script that decodes a h264 1080p video and outputs it via SDL2 on a Raspberry Pi 5. The Raspberry Pi 5 is able to play a h264 1080p video without problem using VLC. Total CPU load with VLC is about 10%. However decoding with ffmpeg and outputting via SDL2 uses around 70% CPU load. Since I want to be able to switch seamlessly between two output videos I will need to decode two videos at the same time. Therefore 70% CPU load for one transcoded 1080p video is not acceptable. How can I make the code more efficient and why is VLC so much more efficient?

This is my current python script:

import numpy as np
import ffmpeg  # ffmpeg-python
import sdl2.ext

in_file = ffmpeg.input('bbb1080_x264.mp4', re=None)

width = 1920
height = 1080

process1 = (
    in_file
    .output('pipe:', format='rawvideo', pix_fmt='bgra')
    .run_async(pipe_stdout=True)
)

sdl2.ext.init()
window = sdl2.ext.Window("Hello World!", size=(width, height))
window.show()
windowsurface = sdl2.SDL_GetWindowSurface(window.window)
windowArray = sdl2.ext.pixels3d(windowsurface.contents)

sdl2.ext.mouse.hide_cursor()

while True:
    in_bytes = process1.stdout.read(width * height * 4)

    if not in_bytes:
        break

    in_frame = (
        np
        .frombuffer(in_bytes, np.uint8)
        .reshape([height, width, 4])
        .transpose(1, 0, 2)
    )

    for event in sdl2.ext.get_events():
        if event.type == sdl2.SDL_QUIT:
            exit()

    windowArray[:] = in_frame
    window.refresh()

process1.wait()

Also it is interesting to note that when I start VLC on a Raspberry Pi 5 this is the output on the terminal

[00007fff78c1a550] avcodec decoder error: cannot start codec (h264_v4l2m2m)
Fontconfig warning: ignoring UTF-8: not a valid region tag
[00007fff68002d70] gles2 generic error: parent window not available
[00007fff68002d70] xcb generic error: window not available
[00007fff680013f0] mmal_xsplitter vout display: Try drm
[00007fff68002d70] drm_vout generic: <<< OpenDrmVout: Fmt=I420
[00007fff68002d70] drm_vout generic error: Failed to get xlease`

It indicates that VLC is not using the h264_v4l2m2m hardware acceleration.

Solution

I figured out how to reduce the processor load:

Enable hw acceleration for decoding
Disable format conversion
Avoiding the stdout

Code with the changes:

in_file = ffmpeg.input('bbb1080_hevc.mp4', hwaccel='auto')

width = 1920
height = 1080

process1 = (
    in_file
    .output('/dev/null', format='rawvideo')
    .run_async(pipe_stdout=True)
)

This is now pure decoding in hardware and actually almost diminishes the ffmpeg processor load. However this code is now of course unusable. I would need to deal with the format conversion in python and find out why the stdout is so slow. I switched to C instead as I should have done from the beginning and @qwr suggested in his comment already.

Edit: I got it working now using C and following this tutorial closely: http://www.dranger.com/ffmpeg/

Then I call the video player I created in C via ctypes in python. This way everything is much more intuitive than calling PySDL2 and piping the ffmpeg input.