I am a new in AWS services and we want to build a simple demo that detect a special word and: [1] trigger an action [2] responses (as speech during the call).
For example, if the user say: "Help" I want to reply "OK" and make an operation (AWS lambda).
We're using Twilio, and Twilio should streaming the audio.
As I understand I have two options, Android Lex and Transcribe, when Lex is for bots and transcribe just translate the speech and can't get involved in conversation.
So the questions are:
What Services should I use to trigger an action when the special word is recognize AND involved in the conversation?
Can I streaming the call directly to AWS service via Twilio?
To be more clear: The communication will be with two persons in real time, and I want to make interject during their call when someone say "Help" I want to add a bot voice to the conversation and say "OK", for example"
[Person 1]: Hi, how are you
[Person 2]: HELP ...
[BOT]: OK (like a third person in a conference call..).
I am not fully clear on the interaction taking place with the user, before they interject with help. Are they listening to a bot, media file, TTS, or communicating with another person in real time?
For realtime analysis, you would need to use Twilio Media Streams, which streams the voice conversation to a service that could then convert the speech to text in near real time, looking for keywords, and then programmatically perform some action based on those keywords.
An example of using Twilio Media streams with Lex:
Use Amazon Lex as a conversational interface with Twilio Media Streams