Search code examples
pythongoogle-speech-to-text-api

How to work with result from google speech to text API


I am working with the google speech to text API. It returns an object of type google.cloud.speech_v1.types.RecognizeResponse. I have found this almost unusable in Python as I cannot iterate over it to get the multiple text strings returned.

After much searching for solutions to make this usable in Python I found a solution in Stack Overflow to use from google.protobuf.json_format.MessageToJson(). However when I run the below function...

def transcribe(self, fp):
    transcribed = []

    data = fp.read()
    speech_content_bytes = base64.b64encode(data)
    speech_content = speech_content_bytes.decode('utf-8')

    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = self.json_path
    os.environ["GCLOUD_PROJECT"] = proj_name
    config = {'language_code': 'en-US'}
    audio = {'content': data}

    client = speech.SpeechClient()
    response = client.recognize(config, audio)
    print('response is a ' + str(type(response)))
    result_json = MessageToJson(response)
    print('result_json is a ' + str(type(result_json)))
    result_json = json.loads(result_json)
    print('now result_json is a ' + str(type(result_json)))

    for result in result_json["results"]:
        transcribed.append(result["alternatives"][0]["transcript"].upper())

    return transcribed

...I get the following output:

response is a <class 'google.cloud.speech_v1.types.RecognizeResponse'>
result_json is a <class 'str'>
now result_json is a <class 'dict'>

As you can see, the result of running the google MessageToJson function is actually a string and I have to load it into a Dict using the json.loads function.

  • Why would the MessageToJson function return a string, rather than a Dict / json object?
  • Is there another way to work with the google.cloud.speech_v1.types.RecognizeResponse object in Python to get the transcribed text?

I dont understand why Google return this object which is so difficult to work with.


Solution

  • The MessageToJson converts the RecognizeResponse from protobuf message to JSON format but in a form of string.

    You can work directly with the RecognizeResponse in the following way:

    response: RecognizeResponse = client.recognize(config=your_config, audio=your_audio)
    final_transcripts = []
    final_transcripts_confidence = []
    for result in response.results:
       alternative = result.alternatives[0]
       final_transcripts_confidence.append(alternative.confidence)
       final_transcripts.append(alternative.transcript)
    

    If you would like to work with MessageToJson anyway and convert it to dictionary you can do the following:

    import json
    from google.protobuf.json_format import MessageToJson
    
    response: RecognizeResponse = client.recognize(config=your_config, audio=your_audio)
    response_json_str = MessageToJson(response, indent=0)
    response_dict = json.loads(response_json_str)
    

    or you use MessageToDict to directly convert to dictionary.

    NOTE:
    From some version the proto conversion changed and results in getting an error: AttributeError: 'DESCRIPTOR'

    To solve this you should use:

    RecognizeResponse.to_json(response)
    

    or alternatively:

    RecognizeResponse.to_dict(response)