I've been using the W3C Speech Synthesizer for the web in my app. I'd like the words to start appearing as I speak them. This is because I want the user to have near-instant feedback on the current word they're speaking. Currently, the result
events in the spec wait to append the entire array after a second or so of not speaking.
I've looked through the standards, but I've only found that it waits a bit to construct the final results list from the result event:
5.1.3 SpeechRecognition Events
result event: Fired when the speech recognizer returns a result
5.1.8 SpeechRecognitionEvent
results attribute: The array of all current recognition results for this session.
I've also tried retrieving the results in onstart
and onpause
methods:
recognition = new webkitSpeechRecognition()
recognition.onstart = function (event) {
//append word
};
recognition.onpause = function (event) {
//append word
};
Anyone know a way to accomplish this "typing" effect of the words as you speak?
The other issue is, if the user stops speaking for a sec, and the results list is compiled (IE, the result event
is fired), and they go to speak again, the results list is not updated.
This happens even if I set recognition.continuous = true;
Found it from Google Developers Introduction Video.
In addition to recognition.continuous = true
, you also need recognition.interimResults = true;
.
Then need to modify your logic slightly in the onresult
handler to account for interim results:
recognition.onresult = function (event) {
var final = "";
var interim = "";
for (var i = 0; i < event.results.length; ++i) {
if (event.results[i].final) {
final += event.results[i][0].transcript;
} else {
interim += event.results[i][0].transcript;
}
}
final_span.innerHTML = final;
interim_span.innerHTML = interim;
}