Search code examples
javascriptmobile-safaritext-to-speech

speechSynthesis not working on mobile Safari even though it's supported


Im trying to use the speechSynthesis API. It's working on desktop browsers and mobile Chrome but not mobile Safari.

  const msg = new SpeechSynthesisUtterance("Hello World");
  window.speechSynthesis.speak(msg);

I added a little test and it seems the API is supported on Safari, could it be a permissions issue that it's not working?

  if ("speechSynthesis" in window) {
    alert("yay");
  } else {
    alert("no");
  }

Solution

  • On my end the issue broke down to proper loading speech synthesis on mobile Safari.

    There are some things to check in order:

    • are voices loaded?
    • are voices even installed on your system?
    • is the utterance configured correctly?
    • is the speak function called from within a user interaction event?

    The following example summarizes these checks and works on MacOS desktop Browsers plus iOS Safari:

    let _speechSynth
    let _voices
    const _cache = {}
    
    /**
     * retries until there have been voices loaded. No stopper flag included in this example. 
     * Note that this function assumes, that there are voices installed on the host system.
     */
    
    function loadVoicesWhenAvailable (onComplete = () => {}) {
      _speechSynth = window.speechSynthesis
      const voices = _speechSynth.getVoices()
    
      if (voices.length !== 0) {
        _voices = voices
        onComplete()
      } else {
        return setTimeout(function () { loadVoicesWhenAvailable(onComplete) }, 100)
      }
    }
    
    /**
     * Returns the first found voice for a given language code.
     */
    
    function getVoices (locale) {
      if (!_speechSynth) {
        throw new Error('Browser does not support speech synthesis')
      }
      if (_cache[locale]) return _cache[locale]
    
      _cache[locale] = _voices.filter(voice => voice.lang === locale)
      return _cache[locale]
    }
    
    /**
     * Speak a certain text 
     * @param locale the locale this voice requires
     * @param text the text to speak
     * @param onEnd callback if tts is finished
     */
    
    function playByText (locale, text, onEnd) {
      const voices = getVoices(locale)
    
      // TODO load preference here, e.g. male / female etc.
      // TODO but for now we just use the first occurrence
      const utterance = new window.SpeechSynthesisUtterance()
      utterance.voice = voices[0]
      utterance.pitch = 1
      utterance.rate = 1
      utterance.voiceURI = 'native'
      utterance.volume = 1
      utterance.rate = 1
      utterance.pitch = 0.8
      utterance.text = text
      utterance.lang = locale
    
      if (onEnd) {
        utterance.onend = onEnd
      }
    
      _speechSynth.cancel() // cancel current speak, if any is running
      _speechSynth.speak(utterance)
    }
    
    // on document ready
    loadVoicesWhenAvailable(function () {
     console.log("loaded") 
    })
    
    function speak () {
      setTimeout(() => playByText("en-US", "Hello, world"), 300)
    }
    <button onclick="speak()">speak</button>

    Details on the code are added as comments within the snippet.