I have written a program in Python 3.6 that makes use of IBM Watson's Speech to Text library. When the program searches a folder and reads through each .wav
file individually, it's supposed to check the file's frequency and flag my IBM Watson integration differently. Then, it takes the response and maps it to a list. Through stub testing, the main, problematic code in question is here:
speech_to_text.set_detailed_response(True)
# Narrowband
if rate < 16000:
x = json.loads(
json.dumps(speech_to_text.recognize(audio_file, content_type='audio/basic', timestamps=True, max_alternatives=0).get_result(),
indent=2), object_hook=lambda d: namedtuple('X', d.keys())(*d.values())
)
# Broadband
else:
x = json.loads(
json.dumps(speech_to_text.recognize(audio_file, content_type='audio/wav', timestamps=True, max_alternatives=0).get_result(),
indent=2), object_hook=lambda d: namedtuple('X', d.keys())(*d.values())
)
This program is completely functional when I supply it with a file over 16 kbps. However, anything less than that, and I get this error:
File "echo_cli.py", line 64, in <module>
json.dumps(speech_to_text.recognize(audio_file, content_type='audio/basic', timestamps=True, max_alternatives=0).get_result(),
File "C:\Python37\lib\site-packages\watson_developer_cloud\speech_to_text_v1.py", line 373, in recognize
accept_json=True)
File "C:\Python37\lib\site-packages\watson_developer_cloud\watson_service.py", line 479, in request
info=error_info, httpResponse=response)
watson_developer_cloud.watson_service.WatsonApiException: Error: This 8000hz audio input requires a narrow band model. See https://<STT_API_ENDPOINT>/v1/models for a list of available models., Code: 400 , Information: {'code_description': 'Bad Request'} , X-dp-watson-tran-id: stream01-167902601 , X-global-transaction-id: f257b1145ba417780a01fd89
As a note, the files I'm using are over a network drive. However, I get the same error when I copy them to my local drive, so I'm thinking that this is an unrelated issue. I'm including this text just in case it rings any bells I'm unaware of.
According to this documentation, I should be able to accept a narrowband file with audio/basic
, and according to print commands I've used, when I load a narrowband .wav
, my program is executing the correct code. What am I doing wrong?
Thanks!
You should only pass the audio/basic
MIME type if that's the type of the file you're uploading (also known as a "Sun .au" file, it's one of the oldest audio file types out there). If you're uploading a WAV file, specify the MIME type as audio/wav
, no matter what the sample rate.