torchaudio.io not properly using ffmpeg

I am following this tutorial about hardware-accelerated gpu encoding/decoding for PyTorch [https://pytorch.org/audio/main/hw_acceleration_tutorial.html], I am encountering an error with the following code:

import torch
import torchaudio

print(torch.__version__) # 1.14.0.dev20221013+cu116
print(torchaudio.__version__) # 0.13.0.dev20221013+cu116
print(torchaudio._extension._FFMPEG_INITIALIZED) # True

from torchaudio.io import StreamReader
local_src = "vid.mp4"
cuda_conf = {
    "decoder": "h264_cuvid",  # Use CUDA HW decoder
    "hw_accel": "cuda:0",  # Then keep the memory on CUDA:0
}

def decode_vid(src, config):
    frames = []
    s = StreamReader(src)
    s.add_video_stream(5, **config)
    for i, (chunk,) in enumerate(s.stream()):
        frames.append(chunk[0])

if __name__ == "__main__":
    vid = decode_vid(local_src, cuda_conf)

The error message (somewhat truncated) is:

File "/home/james/PycharmProjects/AlphaPose/Spectronix/Early_Experiments/vid_gpu_decode.py", line 23, in decode_vid s.add_video_stream(5, **config) File "/home/james/anaconda3/envs/alphapose/lib/python3.7/site-packages/torchaudio/io/_stream_reader.py", line 624, in add_video_stream hw_accel, RuntimeError: Unsupported codec: "h264_cuvid".

I have an RTX 3090 ti as my GPU, which does support the h264_cuvid decoder, and I have been able to decode a video on the command line running (taken from the tutorial linked above)

sudo ffmpeg -hide_banner -y -vsync 0 -hwaccel cuvid -hwaccel_output_format cuda -c:v h264_cuvid -i "https://download.pytorch.org/torchaudio/tutorial-assets/stream-api/NASAs_Most_Scientifically_Complex_Space_Observatory_Requires_Precision-MP4_small.mp4" -c:a copy -c:v h264_nvenc -b:v 5M test.mp4

So it seems torchaudio.io is not properly using ffmpeg. Any insights of how to fix this problem much appreciated. I'm using Ubuntu 22.04.

Solution

RuntimeError: Unsupported codec: "h264_cuvid".

The error happens here, and the StreamReader has not gotten to the point where it executes NVDEC-specific code, so this is generic issue with FFmpeg compatibility.

This suggests that the libavcodec found at runtime is not configured with h264_cuvid.

A possible explanation is that there are multiple installations of FFmpeg in your system and torchaudio is picking up the one without NVDEC support, while when you invoke ffmpeg command, the one with NVDEC support is loaded.

Perhaps you can check your system and see if there are multiple FFmpeg installations and remove the ones without NVDEC support?