python azure text-to-speech azure-cognitive-services

Azure text to speech and play it in virtual microphone using python

My use case is to convert text to speech using Azure and then play it into a virtual microphone.

option 1 - with an intermediate .wav file

I tried both steps manually on a Jupiter notebook.
The problem is, the output .wav file of Azure cannot be played directly on the python "error: No file 'file.wav' found in working directory". When I restart the python kernal, audio can be played.

text-to-speech

audio_config = speechsdk.audio.AudioOutputConfig(filename="file.wav")
...
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=audio_config)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()

audio play

mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
mixer.music.load("file.wav")
mixer.music.play()

option 2 - direct stream to audio device

I tried to configure the audio output device of azure SDK. this method worked for output devices. but when I add an ID of the virtual microphone, it won't play any sound.

audio_config = speechsdk.audio.AudioOutputConfig(use_default_speaker=False,device_name="{0.0.0.00000000}.{9D30BDBF-1418-4AFC-A709-CD4C431833E2}")

Also it will be much better if there is any other method that can direct the audio to a virtual microphone instead of the speaker.

Solution

I found a solution by changing the output a stream, saving to a file and then play it through pygame as follows,

speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config, audio_config=None)
speech_synthesis_result = speech_synthesizer.speak_text_async(text).get()
stream = speechsdk.AudioDataStream(speech_synthesis_result)
stream.save_to_wav_file("file.wav")

mixer.init(devicename = 'Line 1 (Virtual Audio Cable)')
mixer.music.load("file.wav")
mixer.music.play()

Also much appreciated if there is any other method that doesn't need any intermediate audio file.