The general idea: I created a Node JS program that interacts with multiple APIs to recreate a home assistant (like Alexia or Siri). It interacts mainly with IBM Watson. My first goal was to setup Dialogflow so that I could have a real AI processing the questions but due to the update to Dialogflow v2, I have to use Google Cloud and It's too much trouble for me so I just got with a hand-made script that reads possible responses from a configurable list.
My actual goal is to get an audio stream from the user and send it inside my main program. I have set up an express server. It responds with a HTML page when you GET on '/'. The page is the following:
<!DOCTYPE html>
<html lang='fr'>
<head>
<script>
let state = false
function button() {
navigator.mediaDevices.getUserMedia({audio: true})
.then(function(mediaStream) {
// And here I got my stream. So now what do I do?
})
.catch(function(err) {
console.log(err)
});
}
</script>
<title>Audio recorder</title>
</head>
<body>
<button onclick='button()'>Lancer l'audio</button>
</body>
</html>
It records audio from the user when they click the button with mediaDevices.getUserMedia()
My configuration looks like this:
What I'm looking for is a way to launch the recording, then press the stop button and when the stop button is pressed, it automatically send the stream to the Node program. It's preferable if the output is a stream because it's the input type for IBM Watson (or else I will have to store the file, then read it and then delete it).
Thanks for your attention.
Fun fact: The imgur ID of my image starts with "NUL", which means "NOOB" in French lol
Most browsers, but not all (I'm looking at you, Mobile Safari), support the capture and streaming of audio (and video, which you don't care about) using the getUserMedia()
and MediaRecorder
APIs. With these APIs you can transmit your captured audio in small chunks via WebSockets, or socket.io, or a series of POST requests, to your nodejs server. Then the nodejs server can send them along to your recognition service. The challenge here: the audio is compressed and encapsulated in webm. If your service accepts audio in that format, this strategy will work for you.
Or you can try using node-ogg and node-vorbis to accept and decode. (I haven't done this.)
There may be other ways. Maybe somebody who knows one will answer.