I have a function to read the audio signal associated to the video frame
import cv2
def read_audio_frames(file_path):
# Open the video file
video = cv2.VideoCapture(file_path)
# Get audio properties
frame_rate = video.get(cv2.CAP_PROP_FPS)
num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
# Read the audio frames
audio_frames = []
for _ in range(num_frames):
ret, frame = video.read()
if not ret:
break
audio_frames.append(frame)
# Release the video object
video.release()
return audio_frames
where
# Read audio frames
audio_frames = read_audio_frames(file_path)
audio_frames[0] # IN each separated audio frame
len( audio_frames[0][0] ) #= 1080, Not sure why it was at sample rate '44100'
audio_frames[0][0][0]
returned a value of
array([3, 1, 1], dtype=uint8)
However, I don't quite understand why it was.
First, this is a stereo audio. Shouldn't the result returned consisted two value of different channels instead of three? The value did not seem correct either, since uint8 only range from 0-256 of 8bit, clearly lower than the 16bit, 24bit, 32bit used in the audio format.
Also, I don't understand why audio_frames[0][0][0]
. The first audio_frames[0]
corresponding to the video frame. But immediately audio_frames[0][0]
should corresponding to the audio sample (i.e. the sample rate 44100), not audio_frames[0][0][0]
.
A different also returned
audio_frames[15][2][0]
array([3, 1, 1], dtype=uint8)
Notice that for a different .mp4 file
len(audio_frames[0])=300
len(audio_frames[0][0])=1920
len(audio_frames[0][0][0])=3
This indicate len(audio_frames[0][0])
does not seem to correspond to the number of audio sample.
How was the audio represented in cv2? I want to write a function such that the audio signal could be read off as a dBs value.
OpenCV can read audio data from media files.
This is not documented as of July 2023.
There is some example code in various places in the samples/
directory, samples/python
specifically. I found that code to be very sparsely documented. It does not explain the principles.
The code you present in your question reads video data only. You must not name it like it did read any audio data. It does not.