javascript google-chrome web speech-to-text

W3C Speech To Text: output values as you speak

I've been using the W3C Speech Synthesizer for the web in my app. I'd like the words to start appearing as I speak them. This is because I want the user to have near-instant feedback on the current word they're speaking. Currently, the result events in the spec wait to append the entire array after a second or so of not speaking.

I've looked through the standards, but I've only found that it waits a bit to construct the final results list from the result event:

5.1.3 SpeechRecognition Events

result event: Fired when the speech recognizer returns a result

5.1.8 SpeechRecognitionEvent

results attribute: The array of all current recognition results for this session.

I've also tried retrieving the results in onstart and onpause methods:

            recognition = new webkitSpeechRecognition()

            recognition.onstart = function (event) {
                //append word
            };

            recognition.onpause = function (event) {
                //append word
            };

Anyone know a way to accomplish this "typing" effect of the words as you speak?

The other issue is, if the user stops speaking for a sec, and the results list is compiled (IE, the result event is fired), and they go to speak again, the results list is not updated.

This happens even if I set recognition.continuous = true;

Solution

Found it from Google Developers Introduction Video.

In addition to recognition.continuous = true, you also need recognition.interimResults = true;.

Then need to modify your logic slightly in the onresult handler to account for interim results:

recognition.onresult = function (event) {
  var final = "";
  var interim = "";
  for (var i = 0; i < event.results.length; ++i) {
    if (event.results[i].final) {
      final += event.results[i][0].transcript;
    } else {
      interim += event.results[i][0].transcript;
    }
  }
  final_span.innerHTML = final;
  interim_span.innerHTML = interim;
}