Search code examples
webspeech-api

How to detect speech-to-text support and what is the meaning of mozSpeechRecognition, msSpeechRecognition, oSpeechRecognition?


I'm trying to better understand when Webspeech speech-to-text is actually avaialable and operational. In this process I see these few lines code all over the web, for assessing is speech-to-text is supported by the browser:

const NativeSpeechRecognition = typeof window !== 'undefined' && (
  window.SpeechRecognition ||
  window.webkitSpeechRecognition ||
  window.mozSpeechRecognition ||
  window.msSpeechRecognition ||
  window.oSpeechRecognition
)

According https://caniuse.com/speech-recognition all actually supporting browsers (chrome, Opera, UC, Samsung, Safari, QQ, Baidu) use the webkit prefix: window.webkitSpeechRecognition. The caniuse.com table says that speech-to-text does not work on Edge (109 incl), while on my Windows PC, it works with Edge 109.0.1518.78., for my own JS web app (using the react-speech-recognition package and also with https://dictation.io/speech).

My web searches on "oSpeechRecognition" or "mozSpeechRecognition" or "msSpeechRecognition" have not been instructive.

I've run that code sandbox to know which of these is defined.

var recognition = new (window.SpeechRecognition ||
  window.webkitSpeechRecognition ||
  window.mozSpeechRecognition ||
  window.msSpeechRecognition)();
recognition.lang = "en-US";
recognition.interimResults = false;
recognition.maxAlternatives = 5;
recognition.start();

console.log("window.SpeechRecognition", window.SpeechRecognition);
console.log("window.webkitSpeechRecognition", window.webkitSpeechRecognition);
console.log("window.mozSpeechRecognition", window.mozSpeechRecognition);
console.log("window.msSpeechRecognition", window.msSpeechRecognition);

console.log("selected recognition", recognition);

recognition.onresult = function (event) {
  console.log("You said: ", event.results[0][0].transcript);
};

On Chrome 109, the console is:

window.SpeechRecognition  undefined
window.webkitSpeechRecognition  ƒ SpeechRecognition() {}
window.mozSpeechRecognition  undefined
window.msSpeechRecognition  undefined
selected recognition  EventTarget {grammars: Object, lang: "en-US", continuous: false, interimResults: false, maxAlternatives: 5…}

On Edge 109, the console is exactly the same:

window.SpeechRecognition  undefined
window.webkitSpeechRecognition  ƒ SpeechRecognition() {}
window.mozSpeechRecognition  undefined
window.msSpeechRecognition  undefined
selected recognition  EventTarget {grammars: Object, lang: "en-US", continuous: false, interimResults: false, maxAlternatives: 5…}

On Firefox 109, the sandbox does not run.

TypeError undefined is not a constructor $csb$eval /src/index.js:1:18 var recognition = new (window.SpeechRecognition ||

I'd like to understand:

  1. Why caniuse.com says speech-to-text does not work on Edge?
  2. Why I can't even make the code run on Firefox?
  3. What is the origin/purpose of mozSpeechRecognition, msSpeechRecognition, and oSpeechRecognition that is copied on many code examples and articles with no comment and no explanation. As if it is some 10 years old historical code that no one understands, is not useful anymore but everyone uses.

Many thanks. John.


Solution

  • I went through a difficult migration of a 2004 project with over 150 files to Webpack 5. Based on my understanding of your problem, here are my two cents:

    1. The reason caniuse.com says speech-to-text does not work on Edge is likely because they have not updated their information to reflect the current status of Edge. Microsoft has moved to a Chromium-based browser engine for Edge, which means it should have the same speech recognition capabilities as Chrome, Opera, and other Chromium-based browsers. In your testing, you were able to confirm that speech-to-text works on Edge, which supports this theory.

    2. The reason your code does not run on Firefox is that Firefox uses a different API for speech recognition called "Web Speech API". You can check for support using the following code:

        const speechRecognitionSupported = 'webkitSpeechRecognition' in window || 'SpeechRecognition' in window;
    
    1. The origins of mozSpeechRecognition, msSpeechRecognition, and oSpeechRecognition are tied to the various browser engines that existed before the Web Speech API was standardized. These prefixes were used by different browser vendors to add experimental support for speech recognition to their browsers. However, now that the Web Speech API is standardized, these prefixes are no longer necessary and should not be used in modern web development. You should only use the standardized SpeechRecognition API going forward.