Search code examples
pythonpython-3.xgoogle-cloud-speech

using enhanced model in google cloud speech api


I'm trying to use the enhanced models on the Google Speech API like:

gcs_uri="gs://mybucket/averylongaudiofile.ogg"

client = speech.SpeechClient()

audio = types.RecognitionAudio(uri=gcs_uri)
config = types.RecognitionConfig(
        encoding=enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
        language_code='en-US',
        sample_rate_hertz=48000,
        use_enhanced=True,
        model='phone_call',
        enable_word_time_offsets=True,
        enable_automatic_punctuation=True)

operation = client.long_running_recognize(config, audio)

I have enabled data logging, to be able to use the enhanced model, in the 'Cloud Speech API' settings for my project

When I run it, it throws the following error:

Traceback (most recent call last):   File "./transcribe.py", line 126, in <module>
    enable_automatic_punctuation=True) ValueError: Protocol message RecognitionConfig has no "use_enhanced" field.

Any suggestions?


Solution

  • You can use "use_enhanced" in the RecognitionConfig type in the v1p1beta1 package.

    To be able to run your example, you just have to modify the imports you have, to something like the following:

    import google.cloud.speech_v1p1beta1 as speech
    gcs_uri="gs://mybucket/averylongaudiofile.ogg"
    
    client = speech.SpeechClient()
    audio = speech.types.RecognitionAudio(uri=gcs_uri)
    config = speech.types.RecognitionConfig(
            encoding=speech.enums.RecognitionConfig.AudioEncoding.OGG_OPUS,
            language_code='en-US',
            sample_rate_hertz=48000,
            use_enhanced=True,
            model='phone_call',
            enable_word_time_offsets=True,
            enable_automatic_punctuation=True)
    operation = client.long_running_recognize(config, audio)