Search code examples
pythonibm-watsonspeech-to-text

Not getting the expected result with IBM Watson Speech To Text


When trying to test an mp3 file on a standard IBM Watson S2T model, I get the following output:

<bound method DetailedResponse.get_result of <ibm_cloud_sdk_core.detailed_response.DetailedResponse object at 0x00000250B1853700>>

Which is not an error, but also not my desired output.

This is my code:

api = IAMAuthenticator(api_key)
speech_to_text = SpeechToTextV1(authenticator=api)

speech_to_text.set_service_url(url)

with open(mp3-file, "rb") as audio_file:
    result = speech_to_text.recognize(
        model='de-DE_BroadbandModel', audio=audio_file, content_type="audio/mp3"
    ).get_result

print(result)

I am very new to this topic and have not really figured out what the parameters are yet. I hoped to have an output like

{'result': [...]}

I followed this tutorial. What am I doing wrong?


Solution

  • Here's the code that worked for me using a sample audio_file2.mp3

    import json
    from os.path import join, dirname
    from ibm_watson import SpeechToTextV1
    from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
    
    authenticator = IAMAuthenticator('{api_key}')
    speech_to_text = SpeechToTextV1(
        authenticator=authenticator
    )
    
    speech_to_text.set_service_url('{url}')
    
    with open(join(dirname(__file__), './.', 'audio-file2.mp3'),
                   'rb') as audio_file:
        speech_recognition_results = speech_to_text.recognize(
            audio=audio_file,
            content_type='audio/mp3',
            word_alternatives_threshold=0.9
        ).get_result()
    print(json.dumps(speech_recognition_results, indent=2))
    

    Steps:

    1. Once you create the Watson Speech-to-Text service
    2. Replace {url} and {api_key} in the Python code with speech-to-text service credentials.
    3. Save the file with the code as speech-to-text.py.
    4. From a command prompt or a terminal, run pip install ibm-watson and then python speech-to-text.py to see the result similar to the one shown below

    Refer the speech-to-text api docs for more options.

    {
      "result_index": 0,
      "results": [
        {
          "final": true,
          "alternatives": [
            {
              "transcript": "a line of severe thunderstorms with several possible tornadoes is approaching Colorado on Sunday ",
              "confidence": 1.0
            }
          ],
          "word_alternatives": [
            {
              "start_time": 0.2,
              "end_time": 0.35,
              "alternatives": [
                {
                  "word": "a",
                  "confidence": 0.94
                }
              ]
            },
            {
              "start_time": 0.35,
              "end_time": 0.69,
              "alternatives": [
                {
                  "word": "line",
                  "confidence": 0.94
                }
              ]
            },
            {
              "start_time": 0.69,
              "end_time": 0.78,
              "alternatives": [
                {
                  "word": "of",
                  "confidence": 1.0
                }
              ]
            },
            {
              "start_time": 0.78,
              "end_time": 1.13,
              "alternatives": [
                {
                  "word": "severe",
                  "confidence": 1.0
                }
              ]
            },
            {
              "start_time": 1.13,
              "end_time": 1.9,
              "alternatives": [
                {
                  "word": "thunderstorms",
                  "confidence": 1.0
                }
              ]
            },
            {
              "start_time": 4.0,
              "end_time": 4.18,
              "alternatives": [
                {
                  "word": "is",
                  "confidence": 1.0
                }
              ]
            },
            {
              "start_time": 4.18,
              "end_time": 4.63,
              "alternatives": [
                {
                  "word": "approaching",
                  "confidence": 1.0
                }
              ]
            },
            {
              "start_time": 4.63,
              "end_time": 5.21,
              "alternatives": [
                {
                  "word": "Colorado",
                  "confidence": 0.93
                }
              ]
            },
            {
              "start_time": 5.21,
              "end_time": 5.37,
              "alternatives": [
                {
                  "word": "on",
                  "confidence": 0.93
                }
              ]
            },
            {
              "start_time": 5.37,
              "end_time": 6.09,
              "alternatives": [
                {
                  "word": "Sunday",
                  "confidence": 0.94
                }
              ]
            }
          ]
        }
      ]
    }