google-cloud-platform speech-recognition speech-to-text google-speech-api

How to disable sentence-level auto correction in Google Cloud Speech-to-Text API

I am working on a speech recognition task, which involves the detection of children's speaking capability, improvement over time...

I'd like to use the Google Cloud Speech to Text API for the ASR part of the detection. Then I would use the transcripts of different measurements to estimate the advancement.

But! The sentence level autocorrect of Google Speech API consistently rewrites the previous limb of the spoken sentence...

Is there a way to disable the autocorrect of this ASR?

I can't bypass this problem with the "speechContext", "single_utterance" or "maxAlternatives" options.

"single_utterance" may work with words, but it corrects the misspells..

Any advice in this field?

Solution

If you use streaming instead of batch recognize, you should receive an answer as soon as that part of the audio is transcribed, it does not wait for the rest of the sentence. You should then just store the first answer provided by the stream, not the further corrections.

This means that you don't have to wait until isFinal=True.

For a quick and dirty example of what I mean, go tho the speech API page, and run the streaming test with the developer tools open. There you'll see the streaming data received as the words are being spoken: