Search code examples
pythonspeech-to-textazure-speech

How to pass audio buffer to speech to text service using python


I am using azure speech to text service using python to process bunch of audios. In order to process the audios, These are the steps performed-

  1. Download audio from web server to local 'C:/audio' drive.
  2. Pass the path of downloaded audio to Speech SDK's - Audioconfig(filename ='C:/audio/my_audio.wav')

Rather than downloading to local machine, I want to get the file from server and pass it directly to speech to text service. For which,

  1. I stored the audio in bytes form in audio buffer like this- raw_audio = my_audio_in_bytes # class <'bytes'>

  2. Then, I pass the audiobuffer to AudioConfig(filename = raw_audio) - It doesn't works. Because it expects a filepath

Is there a way to pass audiobuffer to this service?

Configuration python code:

speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
audio_config = speechsdk.audio.AudioConfig(filename='C:/audios/audio1.wav')
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)  

Solution

  • @user1990, per our discussion on this GitHub issue, please use batch transcription, as Speech SDK does not directly support recognizing from a WAV file hosted on a web service (you will first need to download it locally).