Search code examples
pythonpython-3.xazureazure-speechazure-text-translation

How can we give the input file from storage container to azure speech api using python


Below is the code,

call_name1="test.wav"
blob_client1=blob_service_client.get_blob_client("bucket/audio",call_name1)
print(blob_client1)

streamdownloader=blob_client1.download_blob()
stream = BytesIO()
streamfinal=streamdownloader.download_to_stream(stream)
print(streamfinal)

speech_key, service_region = "12345", "eastus"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

audio_input = speechsdk.audio.AudioConfig(filename=streamfinal)

Error,

TypeError                                 Traceback (most recent call last)
<ipython-input-6-a402ae91606a> in <module>
     44 speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
     45
---> 46 audio_input = speechsdk.audio.AudioConfig(filename=streamfinal)

C:\ProgramData\Anaconda3\lib\site-packages\azure\cognitiveservices\speech\audio.py in __init__(self, use_default_microphone, filename, stream, device_name)
    213
    214         if filename is not None:
--> 215             self._impl = impl.AudioConfig._from_wav_file_input(filename)
    216             return
    217         if stream is not None:

TypeError: in method 'AudioConfig__from_wav_file_input', argument 1 of type 'std::string const &'

Please help us in reading the audio files from storage container as input in Azure speech api. Thank you!!


Solution

  • As ewong said in the comment, You need to get the stream instead of String.

    download_to_stream is used to download the contents of this blob to a stream. But not azure.cognitiveservices.speech.audio.AudioInputStream what AudioConfig need.

    I cannot find the workaround about converting stream to AudioInputStream. So, It seems only the way that downloads the audio file to the local from Storage Blob and then uploads it by AudioConfig.

    from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
    import azure.cognitiveservices.speech as speechsdk
    
    filename = "test.txt"
    container_name="test-container"
    
    blob_service_client = BlobServiceClient.from_connection_string("DefaultEndpointsProtocol=https;AccountName=pamelastorage;AccountKey=UOyhItMnWJmB54Jmj8U0YtStNFk0vZyN1+nRem9+JwqNVJEMh5deerdfLbhVQl0ztmg96UZEUtRh2HVp8+ZJWA==;EndpointSuffix=core.windows.net")
    container_client=blob_service_client.get_container_client(container_name)
    blob_client = container_client.get_blob_client(filename)
    
    with open(filename, "wb") as f:
        data = blob_client.download_blob()
        data.readinto(f)
    
    audio_input = speechsdk.audio.AudioConfig(filename=filename)
    print(audio_input)