Search code examples
pythonaudioopencv

How was the audio represented in cv2?


I have a function to read the audio signal associated to the video frame

import cv2

def read_audio_frames(file_path):
    # Open the video file
    video = cv2.VideoCapture(file_path)

    # Get audio properties
    frame_rate = video.get(cv2.CAP_PROP_FPS)
    num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))

    # Read the audio frames
    audio_frames = []
    for _ in range(num_frames):
        ret, frame = video.read()
        
        if not ret:
            break
        
        audio_frames.append(frame)
    
    # Release the video object
    video.release()

    return audio_frames

where

# Read audio frames
audio_frames = read_audio_frames(file_path)

audio_frames[0] # IN each separated audio frame
len( audio_frames[0][0] ) #=  1080, Not sure why it was at sample rate '44100'
audio_frames[0][0][0]

returned a value of

array([3, 1, 1], dtype=uint8)

However, I don't quite understand why it was.

First, this is a stereo audio. Shouldn't the result returned consisted two value of different channels instead of three? The value did not seem correct either, since uint8 only range from 0-256 of 8bit, clearly lower than the 16bit, 24bit, 32bit used in the audio format.

Also, I don't understand why audio_frames[0][0][0]. The first audio_frames[0] corresponding to the video frame. But immediately audio_frames[0][0] should corresponding to the audio sample (i.e. the sample rate 44100), not audio_frames[0][0][0].

A different also returned

audio_frames[15][2][0]
array([3, 1, 1], dtype=uint8)

Notice that for a different .mp4 file

len(audio_frames[0])=300
len(audio_frames[0][0])=1920
len(audio_frames[0][0][0])=3

This indicate len(audio_frames[0][0]) does not seem to correspond to the number of audio sample.

How was the audio represented in cv2? I want to write a function such that the audio signal could be read off as a dBs value.


Solution

  • OpenCV can read audio data from media files.

    This is not documented as of July 2023.

    There is some example code in various places in the samples/ directory, samples/python specifically. I found that code to be very sparsely documented. It does not explain the principles.

    The code you present in your question reads video data only. You must not name it like it did read any audio data. It does not.