Search code examples
pythonaudiospeech-recognitiontranscription

How to iterate over an audio file in 20s intervals?


I am trying to transcribe an audio file which is about 3 min long using SpeechRecognition, however, it seems to be unable to transcribe anything longer than 20 seconds. This is the code that I'm using:

r = sr.Recognizer()

audio = FLAC(output_name +'.' + output_format)
audio_length = audio.info.length

file = sr.AudioFile(output_name +'.' + output_format)

with file as source:
    audio = r.record(source, duration = 20)

google = r.recognize_google(audio, language = 'ru-RU' )
print(google)

How can I loop this so that it transcribes 0s - 20s, then 20s - 40s and so on until the audio file ends?

I would want to avoid splitting the file into separate files of 20s length as much as possible.


Solution

  • So I figured it out. My bad for not reading the documentation of the SpeechRecognition module carefully enough, but they have an offset parameter!

    count = 0
    for audio_path in audio_files:
         audio = FLAC(audio_list[count] + '.' + output_format) #specify audio file for length calculation
         audio_length = audio.info.length #get length of audio file
    
         #n.b. mutagen module used for calculating audio length
    
         number_of_iterations = int(audio_length/20)
    
        if number_of_iterations == 0:
            number_of_iterations = 1
    
         file = sr.AudioFile(audio_list[count] + '.' + output_format)
    
    
        for i in range(number_of_iterations):
            with file as source:
                audio = r.record(source, offset = i*20, duration = 20)
    
             google = r.recognize_google(audio, language = 'ru-RU' )
             count = count + 1
             print(google)