google-cloud-platform dockerfile pyaudio google-speech-api audio-device

Assertion Error: Device index out of range (0 devices available; device index should be between 0 and -1 inclusive)

I am working on a speech recognition project. I am using Google speechrecognition api. I have deployed the django project on GCP flex environment using a dockerfile.

Dockerfile:

FROM gcr.io/google-appengine/python

RUN apt-get update
RUN apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 -y
RUN apt-get install python3-pyaudio
RUN virtualenv -p python3.7 /env

ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH

ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt

ADD . /app

CMD gunicorn -b :$PORT main:app

app.yaml file:

runtime: custom
env: flex
entrypoint: gunicorn -b :$PORT main:app

runtime_config:
  python_version: 3

code for taking voice input.

import speech_recognition as sr
r = sr.Recognizer()

with sr.Microphone(device_index=0) as source:
        print("speak")
        audio = r.listen(source)
        try:
            voice_data =" " + r.recognize_google(audio)

I am getting the error: Assertion Error - Device index out of range (0 devices available; device index should be between 0 and -1 inclusive).

# set up PyAudio
        self.pyaudio_module = self.get_pyaudio()
        audio = self.pyaudio_module.PyAudio()
        try:
            count = audio.get_device_count()  # obtain device count
            if device_index is not None:  # ensure device index is in range
                assert 0 <= device_index < count, "Device index out of range ({} devices available; device index should be between 0 and {} inclusive)".format(count, count - 1) …
            if sample_rate is None:  # automatically set the sample rate to the hardware's default sample rate if not specified
                device_info = audio.get_device_info_by_index(device_index) if device_index is not None else audio.get_default_input_device_info()
                assert isinstance(device_info.get("defaultSampleRate"), (float, int)) and device_info["defaultSampleRate"] > 0, "Invalid device info returned from PyAudio: {}".format(device_info)
                sample_rate = int(device_info["defaultSampleRate"])
        except Exception:
            audio.terminate()

It is unable to detect the audio device when I am going to the url. I need to detect the voice from the hosted webapp. What can I do to resolve this issue?

Solution

It seems that the error appears because there is not an audio card in a VM instance of AppEngine. Even if the sound card/drivers are installed, I wonder how the microphone device can be connected to the instance.

This question was marked with label google-speech-api, but the Speech API Client Libraries are not used in the code you shared. Instead, it is used the python package SpeechRecognition. Supposing that you want to use Speech API Client Libraries, you need to use streaming_recognize(), and I'm afraid that you need to change the code for taking voice input from web users' microphone, not the local device microphone.

In this link we can find an example that streams from a file, note that streaming recognition will convert speech data on the fly and won't wait the operation to finish like in the other methods. I'm not python expert, but from this example you would need to change this line to read from other source (from the web users' microphone):

with io.open('./hello.wav', 'rb') as stream:

You would need to do something like the following (audio: true) in the web app to read from the user's microphone, see this link for more reference:

navigator.mediaDevices.getUserMedia({ audio: true, video: false })
      .then(handleSuccess);

A complete example using this approach is the Google Cloud Speech Node with Socket Playground guide. You might want to reuse some NodeJS code to connect it to your current python application. By the way, NodeJS is also available in AppEngine Flex.