Search code examples
pythonazurespeech-recognitionazure-machine-learning-service

Having a problem calling the function AudioConfig.FromWavFileInput through python library


I am trying to process a .wav file with the Azure Cognitive Speech Service. I am using the script below. I get an exception that says "type object 'AudioConfig' has no attribute 'FromWavFileInput'" when I try to setup the wav file by calling AudioConfig.FromWavFileInput(). The documentation says the function exists, at least in the .net library. Does FromWaveFileInput exist for the cognitiveservices-speech python library? How can I process an audio file with python?

import azure.cognitiveservices.speech as speechsdk

speechKey = 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
service_region = 'eastus2'

#### # Creates an instance of a speech config with specified subscription key and service region.
#### # Replace with your own subscription key and service region (e.g., "westus").
speech_config = speechsdk.SpeechConfig(subscription=speechKey, region=service_region)

audioInput = speechsdk.AudioConfig.FromWavFileInput('RainSpain.wav')

#### # Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_input=audioInput)

Solution

  • Indeed as you said. I searched for the keywords AudioConfig & FromWavFileInput on GitHub repo Azure-Samples/cognitive-services-speech-sdk, there is not any Python codes about it except for Java, C#, and C++.

    So per my experience, there are two workaround ways to do it.

    1. Wrap the C++ codes as a Python extension module, or communicate with C++/Java codes.
    2. Directly using Speech service REST APIs with requests, it's simple for Python and Azure Speech Service.