Search code examples
pythongoogle-cloud-platformgoogle-speech-api

Unable to improve transcription accuracy with speech adaptation boost


I'm using SpeechRecognition Python library to perform Speech to Text operations. I'm using the recognize_google_cloud function to use Google Cloud Speech-to-Text API.

Here is my code:

import speech_recognition as sr;
import json;

j = '';

with open('key.json', 'r') as f:
    j = f.read().replace('\n', '');

js = json.loads(j);

r = sr.Recognizer();
mic = sr.Microphone();

with candide as source:
    audio = r.record(source);
    print(r.recognize_google_cloud(audio, language='fr-FR', preferred_phrases=['pistoles', 'disait'], credentials_json=j));

The function recognize_google_cloud send the data captured by the microphone to Google API and selects the most probable transcription of the given speech among a set of alternatives. The parameter preferered_phrases, as explained in this page of the documentation, is used to select an other alternative that contains the listed words.

It is possible to improve these results using speech adaptation boost. As the version of the SpeechRecognition library doesn't let us to specify a boost value, I updated the speech_recognition/__init__.py file with an hard-coded boost value:

        if preferred_phrases is not None:
            speech_config["speechContexts"] = {"phrases": preferred_phrases, "boost": 19}

Unfortunately, when I execute my code, I get the following error:

Traceback (most recent call last):
  File "/home/pierre/.local/lib/python3.8/site-packages/speech_recognition/__init__.py", line 931, in recognize_google_cloud
    response = request.execute()
  File "/home/pierre/.local/lib/python3.8/site-packages/googleapiclient/_helpers.py", line 134, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/home/pierre/.local/lib/python3.8/site-packages/googleapiclient/http.py", line 915, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://speech.googleapis.com/v1/speech:recognize?alt=json returned "Invalid JSON payload received. Unknown name "boost" at 'config.speech_contexts': Cannot find field.". Details: "[{'@type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'config.speech_contexts', 'description': 'Invalid JSON payload received. Unknown name "boost" at \'config.speech_contexts\': Cannot find field.'}]}]">

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "spech_reco.py", line 23, in <module>
    print(r.recognize_google_cloud(audio, language='fr-FR', preferred_phrases=['pistoles', 'disait'], credentials_json=j));
  File "/home/pierre/.local/lib/python3.8/site-packages/speech_recognition/__init__.py", line 933, in recognize_google_cloud
    raise RequestError(e)
speech_recognition.RequestError: <HttpError 400 when requesting https://speech.googleapis.com/v1/speech:recognize?alt=json returned "Invalid JSON payload received. Unknown name "boost" at 'config.speech_contexts': Cannot find field.". Details: "[{'@type': 'type.googleapis.com/google.rpc.BadRequest', 'fieldViolations': [{'field': 'config.speech_contexts', 'description': 'Invalid JSON payload received. Unknown name "boost" at \'config.speech_contexts\': Cannot find field.'}]}]">

Is there an error in my request?


Solution

  • I understand that you are modifying the speech_recognition/__init__.py file of the SpeechRecognition library in order to include the "boost" parameter in your request.

    When reviewing this file I noticed that it is using the 'v1' version of the API; however, the "boost" parameter is only supported in the ‘v1p1beta1’ version

    Therefore, another of the adaptations that you could make in the code is the following:

    `speech_service = build ("speech","v1p1beta1", credentials = api_credentials, cache_discovery = False)`

    With this modification you should no longer see the BadRequest error.

    At the same time, please consider that this library is a third-party library that uses the Google Speech-to-text API internally. Therefore, if this library does not cover all your current needs, another alternative could create your own implementation directly using the Speech-to-text API Python Client library.