I want to create a python script that decodes a h264 1080p video and outputs it via SDL2 on a Raspberry Pi 5. The Raspberry Pi 5 is able to play a h264 1080p video without problem using VLC. Total CPU load with VLC is about 10%. However decoding with ffmpeg and outputting via SDL2 uses around 70% CPU load. Since I want to be able to switch seamlessly between two output videos I will need to decode two videos at the same time. Therefore 70% CPU load for one transcoded 1080p video is not acceptable. How can I make the code more efficient and why is VLC so much more efficient?
This is my current python script:
import numpy as np
import ffmpeg # ffmpeg-python
import sdl2.ext
in_file = ffmpeg.input('bbb1080_x264.mp4', re=None)
width = 1920
height = 1080
process1 = (
in_file
.output('pipe:', format='rawvideo', pix_fmt='bgra')
.run_async(pipe_stdout=True)
)
sdl2.ext.init()
window = sdl2.ext.Window("Hello World!", size=(width, height))
window.show()
windowsurface = sdl2.SDL_GetWindowSurface(window.window)
windowArray = sdl2.ext.pixels3d(windowsurface.contents)
sdl2.ext.mouse.hide_cursor()
while True:
in_bytes = process1.stdout.read(width * height * 4)
if not in_bytes:
break
in_frame = (
np
.frombuffer(in_bytes, np.uint8)
.reshape([height, width, 4])
.transpose(1, 0, 2)
)
for event in sdl2.ext.get_events():
if event.type == sdl2.SDL_QUIT:
exit()
windowArray[:] = in_frame
window.refresh()
process1.wait()
Also it is interesting to note that when I start VLC on a Raspberry Pi 5 this is the output on the terminal
[00007fff78c1a550] avcodec decoder error: cannot start codec (h264_v4l2m2m)
Fontconfig warning: ignoring UTF-8: not a valid region tag
[00007fff68002d70] gles2 generic error: parent window not available
[00007fff68002d70] xcb generic error: window not available
[00007fff680013f0] mmal_xsplitter vout display: Try drm
[00007fff68002d70] drm_vout generic: <<< OpenDrmVout: Fmt=I420
[00007fff68002d70] drm_vout generic error: Failed to get xlease`
It indicates that VLC is not using the h264_v4l2m2m hardware acceleration.
I figured out how to reduce the processor load:
Code with the changes:
in_file = ffmpeg.input('bbb1080_hevc.mp4', hwaccel='auto')
width = 1920
height = 1080
process1 = (
in_file
.output('/dev/null', format='rawvideo')
.run_async(pipe_stdout=True)
)
This is now pure decoding in hardware and actually almost diminishes the ffmpeg processor load. However this code is now of course unusable. I would need to deal with the format conversion in python and find out why the stdout is so slow. I switched to C instead as I should have done from the beginning and @qwr suggested in his comment already.
Edit: I got it working now using C and following this tutorial closely: http://www.dranger.com/ffmpeg/
Then I call the video player I created in C via ctypes in python. This way everything is much more intuitive than calling PySDL2 and piping the ffmpeg input.