Search code examples
pythonbluetoothspeech-recognitionpyaudio

Python SpeechRecognition mic in list_microphone_names() but not in list_working_microphones()


I'm following the code listed here to build my own smart speaker. I purchased this bluetooth speaker/mic. The mic works just fine when I record audio with it in audacity, and works when I use the following code that uses PyAudio but not SpeechRecognition

import pyaudio
import wave
from array import array

FORMAT=pyaudio.paInt16
CHANNELS=2
RATE=44100*2
CHUNK=1024
RECORD_SECONDS=5
FILE_NAME="RECORDING.wav"

audio=pyaudio.PyAudio() #instantiate the pyaudio

#recording prerequisites
stream=audio.open(format=FORMAT,channels=CHANNELS,
                  rate=RATE,
                  input=True,
                  frames_per_buffer=CHUNK)

#starting recording
frames=[]

for i in range(0,int(RATE/CHUNK*RECORD_SECONDS)):
    data=stream.read(CHUNK)
    data_chunk=array('h',data)
    vol=max(data_chunk)
    if(vol>=500):
        print("something said")
        frames.append(data)
    else:
        print("nothing")
    print("\n")


#end of recording
stream.stop_stream()
stream.close()
audio.terminate()
#writing to file
wavfile=wave.open(FILE_NAME,'wb')
wavfile.setnchannels(CHANNELS)
wavfile.setsampwidth(audio.get_sample_size(FORMAT))
wavfile.setframerate(RATE)
wavfile.writeframes(b''.join(frames))#append frames recorded to file
wavfile.close()

However when I try using the following code

import speech_recognition as sr
import pyaudio

r = sr.Recognizer()

mic = sr.Microphone(device_index=1)

with mic as source:
    r.adjust_for_ambient_noise(source)
    audio = r.listen(source, timeout=5)

print(r.recognize_google(audio))

With this speaker/mic, it hangs indefinitely. I've used a usb mic, switching the device_index, and it works fine. When I list_microphone_names() I can see the bluetooth mic in my list of options as 'Headset Microphone (Bluetooth H' alongside my usb mic 'Microphone (Blue Snowball)', however when I list_working_microphones() the bluetooth mic is gone. Essentially, it recognizes that the device exists but does not hear audio through it during r.listen().

Anyone know what could be causing this?


Solution

  • I dug into the source code for Recognizer.listen() and found that the issue had to do with the "energy" level it uses as a threshold to start and stop recording audio. The default energy level (measured by audioop.rms(buffer, source.SAMPLE_WIDTH)) used as a cutoff to determine whether someone is speaking or not is 300, and decreases until the audio level breaches this threshold for the first time. Then, while it is recording, it checks for a number of concurrent instances of the audio being below the cutoff (pause_buffer_count, default 35) to conclude a phrase.

    The issue with this that the bluetooth mic I am using appears to pick up a lot of ambient noise (and/or is just hot garbage) and even when I am not speaking would have energy of 100-400, so the program would think I was still talking. I fixed this by letting adjust_for_ambient_noise run for a longer time before attempting to listen to audio.

    My secondary problem was not waiting long enough for adjust_for_ambient_noise to run before speaking, which would cut off my phrase, sometimes below the audio length limit for transcribing which would cause it to silently attempt to re-record the statement. That was fixed with a simple print("speak now") after adjust_for_ambient_noise.