python audio speech-recognition wav speech-to-text

Recording audio and Speech to text conversion using a wav file in Python

I am trying to record an audio and convert it into text in python. Following is my code.

import speech_recognition as sr
import sounddevice as sd
import numpy as np
import os
from scipy.io.wavfile import write

fs = 44100  # Sample rate
seconds = 15  # Duration of recording
print("Start recording the answer.....")
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2)
sd.wait()  # Wait until recording is finished
write('output.wav', fs, myrecording.astype(np.int16))  # Save as WAV file in 16-bit format
recognizer = sr.Recognizer()
sound = "output.wav"

with sr.AudioFile(sound) as source:
   recognizer.adjust_for_ambient_noise(source)
   print("Converting the answer to text...")
   audio = recognizer.listen(source)

   try:
      text = recognizer.recognize_google(audio)
      print("The converted text:" + text)

   except Exception as e:
      print('Exception',e)

When I play the output.wav file there's nothing in it. Therefore the speech to text conversion also gives an exception. Can someone please give a solution? Thanks in advance.

Solution

I would try loading in another wav file to test the sounddevice and speech_recognizer parts separately. I am doing something similar and both sides work alone, but together there is an issue because of sounddevice writing wavs in float 32 and it seems speech_recognizer requires int32. Perhaps something is going awry where you convert to int16. If you use audacity, are you sure its silence? I tried to use wavio to do the file write instead, but couldn't tell from the documentation what the sampwidth should be.

update: I was able to get sounddevice to record audio to work with the sound_recognition library by adding this line in the beginning: sounddevice.default.dtype='int32', 'int32' The default values are float32 for both input and output. For some reason I don't understand, changing only the output did not fix the problem. Soundfile or scipy work for the file writing. Also, audacity still believes the wavs are float32... I think something else may be going on bc when I export a file from audacity, the header looks the same as the incompatible files, but speech_recognizer accepts it.