Search code examples
javascriptgoogle-chromewebspeech-to-text

W3C Speech To Text: output values as you speak


I've been using the W3C Speech Synthesizer for the web in my app. I'd like the words to start appearing as I speak them. This is because I want the user to have near-instant feedback on the current word they're speaking. Currently, the result events in the spec wait to append the entire array after a second or so of not speaking.

I've looked through the standards, but I've only found that it waits a bit to construct the final results list from the result event:

5.1.3 SpeechRecognition Events

result event: Fired when the speech recognizer returns a result

5.1.8 SpeechRecognitionEvent

results attribute: The array of all current recognition results for this session.

I've also tried retrieving the results in onstart and onpause methods:

            recognition = new webkitSpeechRecognition()

            recognition.onstart = function (event) {
                //append word
            };

            recognition.onpause = function (event) {
                //append word
            };

Anyone know a way to accomplish this "typing" effect of the words as you speak?


The other issue is, if the user stops speaking for a sec, and the results list is compiled (IE, the result event is fired), and they go to speak again, the results list is not updated.

This happens even if I set recognition.continuous = true;


Solution

  • Found it from Google Developers Introduction Video.

    In addition to recognition.continuous = true, you also need recognition.interimResults = true;.

    Then need to modify your logic slightly in the onresult handler to account for interim results:

    recognition.onresult = function (event) {
      var final = "";
      var interim = "";
      for (var i = 0; i < event.results.length; ++i) {
        if (event.results[i].final) {
          final += event.results[i][0].transcript;
        } else {
          interim += event.results[i][0].transcript;
        }
      }
      final_span.innerHTML = final;
      interim_span.innerHTML = interim;
    }