Search code examples
pythonibm-cloudspeech-recognitionibm-watsonspeech-to-text

IBM Watson Speech to Text in Python gives 404 when using model parameter


I am testing the use of IBM Watson Speech to Text with Python. I was able to successfully test transcribing an audio in English but when I put the model parameter to change the language model for my language, a 404 not found error appears. I have already looked at the IBM page several times that explains the use of the model parameter and I can't understand what's missing. Can anyone help?

My code:

from ibm_watson import SpeechToTextV1
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator

api = IAMAuthenticator("my_credential")
speech_2_text = SpeechToTextV1(authenticator=api)

speech_2_text.set_service_url("https://api.us-south.speech-to-text.watson.cloud.ibm.com/instances/20a185d6-6953-4334-9cea-e9f5ebc2267d?model=fr-FR_BroadbandModel")

with open("test.mp3", "rb") as audio_file:
    result = speech_2_text.recognize(
    audio=audio_file,content_type="audio/mp3"
    ).get_result()

Error message:

ibm_cloud_sdk_core\base_service.py", line 224, 
in send raise ApiException(ibm_cloud_sdk_core.api_exception.ApiException: Error: Not Found, Code: 404

Solution

  • The model should be passed as part of the recognize method

    speech_recognition_results = speech_to_text.recognize(
            audio=audio_file,
            content_type='audio/mp3',
            word_alternatives_threshold=0.9,
            model='fr-FR_BroadbandModel'
        ).get_result()
    

    Pasting the complete code that worked me for your reference

    import json
    from os.path import join, dirname
    from ibm_watson import SpeechToTextV1
    from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
    
    authenticator = IAMAuthenticator('<API_KEY>')
    speech_to_text = SpeechToTextV1(
        authenticator=authenticator
    )
    
    speech_to_text.set_service_url('<URL>')
    
    with open(join(dirname(__file__), './.', 'audio-file2.mp3'),
                   'rb') as audio_file:
        speech_recognition_results = speech_to_text.recognize(
            audio=audio_file,
            content_type='audio/mp3',
            word_alternatives_threshold=0.9,
            model='fr-FR_BroadbandModel'
        ).get_result()
    print(json.dumps(speech_recognition_results, indent=2))