javascript ios safari speech-to-text webspeech-api

Bad results on web speech recognition on safari browser

Greetins,

I am currently trying to implement a speech recognition functionality on my application. According to the JS documentation here, speech to text is supported since Safari 14.1. Also, I am using the following configurations:

    const { webkitSpeechRecognition } = (window as any)
    const recognition = new webkitSpeechRecognition();
    recognition.lang = 'pt-BR';
    recognition.continuous = true;
    recognition.interimResults = false;
    recognition.maxAlternatives = 1;
    // Avoid garbage collection bugs
    this.garbage.push(recognition);
    recognition.start();

On Chrome it works just fine, but on Safari the recognition results are super bad. It can understand me sometimes, but often it misinterprets my words, giving me wrong results. For example, if I say: "Hello assistant, change contrast", the result might be something like: "Hello assist charge contract hello assist charge charge" or something.

One peculiarity of this problem is that the events fired by the speech recognition interface on safari are just the start and audiostart.

Is anyone facing a similar issue or found a solution to this problem? I am also accepting alternatives for implementing speech recognition on my application.

Thanks in advance!

EDIT

On my end, you can see this problem by visiting any website that relies on the Web Speech API. Some examples that you can check:

https://www.google.com/chrome/demos/speech.html

https://www.audero.it/demo/web-speech-api-demo.html

Solution

So, if anyone else stumbles at this problem, I have filled an issue at the chromium forum. You can consult the issue here.

Basically, the Chrome team is having some problems integrating this functionality in their browser on iOS devices.

In my case, what I did was use Hark.js to get events based on when the user starts and stops speaking paired with Vosk on my backend to do the offline Speech-to-Text translation.

IMO the browser speech recognition API is fine if you want your app to run on a specific browser. However, if you wish to target all browsers accross different operational systems, I would advise looking for a different solution.