Search code examples
python-3.xaudio-recordingspeech-to-textazure-cognitive-services

How to access audio stream recorded by Microsoft Speech SDK


I am using a robot to hold conversations with volunteers. I am using python3 and Microsoft's Speech SDK to transcribe the volunteers responses. Both the recording and the transcription is done using the Speech SDK and I have not been able to find a way how to access and save the recorded audio file.

Minimal code example:

import time
import azure.cognitiveservices.speech as speechsdk

# define callback
def handle_final_result(evt):
    global stop
    print('Heard:', evt.result.text)
    if 'stop' in evt.result.text:
        stop = True
        # TODO: somehow need to save all audio up to this point

# setup speech recognizer using microphone as input
audio_config = speechsdk.audio.AudioConfig(device_name='sysdefault:CARD=Microphone')
speech_key, service_region = "your-key-here", "your-region-here"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

# setup callback and start listening
speech_recognizer.recognized.connect(handle_final_result)
speech_recognizer.start_continuous_recognition()
stop = False
while not stop:
    time.sleep(0.2)
speech_recognizer.stop_continuous_recognition_async()

There is a similar post/response for javascript, but I have been unable to use that sample to get things working in python3.


Solution

  • Currently Speech SDK does not provide APIs to capture the microphone audio used for speech transcription. That feature will be supported in future releases. If you need access to microphone data, the recommended approach currently is to create microphone stream outside of Speech SDK in your app and then use e.g. Speech SDK's pushstream APIs to feed audio data to for speech transcription. At the same time your app is able to capture/process the audio for your needs.

    https://learn.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/azure.cognitiveservices.speech.audio.pushaudioinputstream?view=azure-python