I am trying to record an audio and convert it into text in python. Following is my code.
import speech_recognition as sr
import sounddevice as sd
import numpy as np
import os
from scipy.io.wavfile import write
fs = 44100 # Sample rate
seconds = 15 # Duration of recording
print("Start recording the answer.....")
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=2)
sd.wait() # Wait until recording is finished
write('output.wav', fs, myrecording.astype(np.int16)) # Save as WAV file in 16-bit format
recognizer = sr.Recognizer()
sound = "output.wav"
with sr.AudioFile(sound) as source:
recognizer.adjust_for_ambient_noise(source)
print("Converting the answer to text...")
audio = recognizer.listen(source)
try:
text = recognizer.recognize_google(audio)
print("The converted text:" + text)
except Exception as e:
print('Exception',e)
When I play the output.wav file there's nothing in it. Therefore the speech to text conversion also gives an exception. Can someone please give a solution? Thanks in advance.
I would try loading in another wav file to test the sounddevice and speech_recognizer parts separately. I am doing something similar and both sides work alone, but together there is an issue because of sounddevice writing wavs in float 32 and it seems speech_recognizer requires int32. Perhaps something is going awry where you convert to int16. If you use audacity, are you sure its silence? I tried to use wavio to do the file write instead, but couldn't tell from the documentation what the sampwidth should be.
update: I was able to get sounddevice to record audio to work with the sound_recognition library by adding this line in the beginning: sounddevice.default.dtype='int32', 'int32'
The default values are float32 for both input and output. For some reason I don't understand, changing only the output did not fix the problem. Soundfile or scipy work for the file writing.
Also, audacity still believes the wavs are float32... I think something else may be going on bc when I export a file from audacity, the header looks the same as the incompatible files, but speech_recognizer accepts it.