I want to implement a python project in which the input will be a .mp4 file and the output will be the transcript or subtitle of the video. The constraint is to use OpenVINO. How can I do that?
mp4 is a container. I believe the current OpenVINO speech demo/samples use wav files as that is what the model is trained for.
If you can convert your mp3 or audio from the mp4 container using a tool to convert it to the wav format, that may work.