I'm currently working on a tool allowing me to read all my notifications thanks to the connection to different APIs.
It's working great, but now I would like to put some vocal commands to do some actions.
Like when the software is saying "One mail from Bob", I would like to say "Read it", or "Archive it".
My software is running through a node server, currently I don't have any browser implementation, but it can be a plan.
What is the best way in node JS to enable speech to text?
I've seen a lot of threads on it, but mainly it's using the browser and if possible, I would like to avoid that at the beginning. Is it possible?
Another issue is some software requires the input of a wav file. I don't have any file, I just want my software to be always listening to what I say to react when I say a command.
Do you have any information on how I could do that?
Cheers
Both of the answers here already are good, but what I think you're looking for is Sonus. It takes care of audio encoding and streaming for you. It's always listening offline for a customizable hotword (like Siri or Alexa). You can also trigger listening programmatically. In combination with a module like say, you could enable your example by doing something like:
say.speak('One mail from Bob', function(err) {
Sonus.trigger(sonus, 1) //start listening
});
You can also use different hotwords to handle the subsequent recognized speech in a different way. For instance:
"Notifications. Most recent." and "Send message. How are you today"
Throw that onto a Pi or a CHIP with a microphone on your desk and you have a personal assistant that reads your notifications and reacts to commands.
Simple Example:
https://twitter.com/_evnc/status/811290460174041090
Something a bit more complex:
https://youtu.be/pm0F_WNoe9k?t=20s
Full documentation:
https://github.com/evancohen/sonus/blob/master/docs/API.md
Disclaimer: This is my project :)