Search code examples
pythonpython-3.xspeech-recognitionspeech-to-textcmusphinx

Python avoid file IO with speech_recognition or PocketSphinx libraries


I often have a problem when dealing with Python libraries which have methods requiring file paths as parameters. This is an issue when I have some data in memory that I want to use with the library function. What I end up doing in these cases is:

  1. write a temporary file containing the data.
  2. pass the temporary file path to the library function.
  3. remove the file after the function returns.

This works well enough, however, for time sensitive applications the file IO involved with writing to and reading from the temporary file is a deal breaker.

Does anyone have any solutions to the problem? I would think that there is no one size fits all solution here but I don't want to make any assumptions. However, let me describe my current use case and hopefully someone will be able to help me with that specifically.

I am using the speech_recognition library to convert a large number of audio files to text. I have the data for the audio files in a binary form. Here is my code:

from os import path, remove

from scipy.io.wavfile import write

import speech_recognition as sr

audio_list = ... # get the audio

text_list = []

for item in audio_list:

        temp_name = 'temp.wav'
        # create temporary file, writing it as a wave for speech_recognition to read
        write(temp_name, rate, item)

        audio_file = path.join(path.dirname(path.realpath('__file__')), temp_name) 

        recognizer = sr.Recognizer()

        # this is where I need to have the path to the file
        with sr.AudioFile(audio_file) as source:
            audio = recognizer.record(source)

        text = recognizer.recognize_sphinx(audio)
        text_list.append(text)

        remove(temp_name) 

The speech_recognition library uses PocketSphinx as a backend. PocketSphinx has its own Python API but I was unable to have any luck with that either.

Can anyone help me reduce this file IO?


Solution

  • The sr.AudioFile constructor also accepts a 'file-like object', and SciPy should be able to write to one. In your case, it sounds like io.BytesIO would be a good fit. It is a file-like object built around an in-memory buffer.

    Make one, then use it like you would any other file-like object:

    import io
    
    ...
    
    buffer = io.BytesIO()
    
    ...
    
    write(buffer, rate, item)
    
    ...
    
    with sr.AudioFile(buffer) as source:
        audio = recognizer.record(source)