Search code examples
pythonpython-3.xwavibm-watsonspeech-to-text

IBM Watson Speech to Text Audio/Basic not accepting narrowband .WAV


I have written a program in Python 3.6 that makes use of IBM Watson's Speech to Text library. When the program searches a folder and reads through each .wav file individually, it's supposed to check the file's frequency and flag my IBM Watson integration differently. Then, it takes the response and maps it to a list. Through stub testing, the main, problematic code in question is here:

        speech_to_text.set_detailed_response(True)

        # Narrowband
        if rate < 16000:
            x = json.loads(
                json.dumps(speech_to_text.recognize(audio_file, content_type='audio/basic', timestamps=True, max_alternatives=0).get_result(),
                indent=2), object_hook=lambda d: namedtuple('X', d.keys())(*d.values())
                )

        # Broadband
        else:
            x = json.loads(
                json.dumps(speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=True, max_alternatives=0).get_result(),
                indent=2), object_hook=lambda d: namedtuple('X', d.keys())(*d.values())
                )

This program is completely functional when I supply it with a file over 16 kbps. However, anything less than that, and I get this error:

  File "echo_cli.py", line 64, in <module>
    json.dumps(speech_to_text.recognize(audio_file, content_type='audio/basic', timestamps=True, max_alternatives=0).get_result(),
  File "C:\Python37\lib\site-packages\watson_developer_cloud\speech_to_text_v1.py", line 373, in recognize
    accept_json=True)
  File "C:\Python37\lib\site-packages\watson_developer_cloud\watson_service.py", line 479, in request
    info=error_info, httpResponse=response)
watson_developer_cloud.watson_service.WatsonApiException: Error: This 8000hz audio input requires a narrow band model.  See https://<STT_API_ENDPOINT>/v1/models for a list of available models., Code: 400 , Information: {'code_description': 'Bad Request'} , X-dp-watson-tran-id: stream01-167902601 , X-global-transaction-id: f257b1145ba417780a01fd89

As a note, the files I'm using are over a network drive. However, I get the same error when I copy them to my local drive, so I'm thinking that this is an unrelated issue. I'm including this text just in case it rings any bells I'm unaware of.

According to this documentation, I should be able to accept a narrowband file with audio/basic, and according to print commands I've used, when I load a narrowband .wav, my program is executing the correct code. What am I doing wrong?

Thanks!


Solution

  • You should only pass the audio/basic MIME type if that's the type of the file you're uploading (also known as a "Sun .au" file, it's one of the oldest audio file types out there). If you're uploading a WAV file, specify the MIME type as audio/wav, no matter what the sample rate.