I am trying to take a streamed audio and convert it to text using Google text to speech. Then pass that that text as input to a conversation not on Watson. Watson then returns its answer. The latter half works great.
The issue I am having is that I can't get the script to pass the text from the recorded speech to the Watson service I created.
I don't get an error, I just get nothing. The mic is working (I tested it with another script). The program actually indicates I could understand my response (no text I assume). Here is my code
import os
import watson_developer_cloud
import speech_recognition as sr
from gtts import gTTS
import watson_developer_cloud
import time
# Set up Assistant service.
service = watson_developer_cloud.AssistantV1(
#username = 'USERNAME', # replace with service username
#password = 'PASSWORD', # replace with service password
iam_api_key = 'xxxxxxxxxx', # replace with service username
url = 'xxxxxxxxxx', # replace with service password
version = 'xxxxxxxxxx'
)
workspace_id = 'xxxxxxxxxxxxxx' # replace with workspace ID
def getaudiodevices():
devices = os.popen("arecord -l")
device_string = devices.read()
device_string = device_string.split("\n")
for line in device_string:
if line.find("card") != -1:
print("hw:" + line[line.find("card") + 5] + "," + line[line.find("device") + 7])
def speak(audiostring):
print(audiostring)
tts = gTTS(text=audiostring, lang='en')
tts.save('audio.mp3')
os.system('mpg321 audio.mp3')
def recordaudio():
# Record Audio
r = sr.Recognizer()
with sr.Microphone(0) as source:
print("Say something!")
audio = r.listen(source,phrase_time_limit=10)
# Speech recognition ******
data = " "
try:
data = r.recognize_google(audio)
print("You said: " + data)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service; {0}".format(e))
return data
# Initialize with empty value to start the conversation.
user_input = ''
context = {}
current_action = ''
# Main input/output loop
while current_action != 'end_conversation':
# Send message to Assistant service.
response = service.message(
workspace_id = workspace_id,
input = {
'text': user_input
},
context = context
)
# Print the output from dialog, if any.
if response['output']['text']:
print(response['output']['text'][0])
speak(response['output']['text'][0])
# Update the stored context with the latest received from the dialog.
context = response['context']
# Check for action flags sent by the dialog.
if 'action' in response['output']:
current_action = response['output']['action']
# User asked what time it is, so we output the local system time.
if current_action == 'display_time':
print('The current time is ' + time.strftime('%I:%M:%S %p') + '.')
speak('The current time is ' + time.strftime('%I:%M:%S %p') + '.')
# If we're not done, prompt for next round of input.
if current_action != 'end_conversation':
user_input = input('>> ')
Currently I can write the speech from keyboard and it works. I want the user input to come from the text generated from the transcribes audio using Google Text to speech. I need to get the data from the recorded audio into the main part of my Python script where it is communicating with the Watson service.
So, the solution by @Denziloe was correct with a few more additions.
In the main I put a while loop for while the microphone is active, I then initiated user input to start the service connection. Then I put in another loop with data = recordaudio() and set the user input to data.
It now works.
Thanks everyone.