python google-speech-api google-speech-to-text-api

Can you pin the model version of Google Speech to Text?

I would like to transcribe audio using the Google speech to text API (STT), but I need the transcriptions to be consistent over time. In other words, even if Google improves the STT model, is it possible to pin the version of the STT model I used originally so the transcriptions stay consistent? I'm using the Google speech Python client library.

Solution

Unfortunately it is not possible to define a specific version of the STT model. What I could suggest to be somewhat consistent is to define the model to be used on your STT RecognitionConfig().

model

Which model to select for the given request. Select the model best suited to your domain to get best results. If a model is not explicitly specified, then we auto-select a model based on the parameters in the RecognitionConfig.

Model | Description

command_and_search | Best for short queries such as voice commands or voice search.

phone_call | Best for audio that originated from a phone call (typically recorded at an 8khz sampling rate).

video | Best for audio that originated from video or includes multiple speakers. Ideally the audio is recorded at a 16khz or greater sampling rate. This is a premium model that costs more than the standard rate.

default | Best for audio that is not one of the specific audio models. For example, long-form audio. Ideally the audio is high-fidelity, recorded at a 16khz or greater sampling rate.