Movie py : importing audio from text-to-speech in memory

I'm trying to use text-to-speech from Azure in combination with movie.py to create the audio stream for a video.

result = synthesizer.speak_ssml_async(xml_string).get()
stream = AudioDataStream(result)

The output of this process is:

<azure.cognitiveservices.speech.AudioDataStream at 0x2320cb87ac0>

However, movie.py is not able to import this with the following command:

audioClip = AudioFileClip(stream)

This is giving me the error:

AudioDataStream' object has no attribute 'endswith'

Do I need to convert the Azure Stream to .wav? How do I do that? I need to do the entire process without writing .wav files locally (e.g. stream.save_to_wav_file) but just using the memory streams.

Can someone spot a light, please?

Solution

I write a HTTP trigger Python function for you, just try the code below :

import azure.functions as func
import azure.cognitiveservices.speech as speechsdk
import tempfile
import imageio
imageio.plugins.ffmpeg.download()
from moviepy.editor import AudioFileClip



speech_key="<speech service key>"
service_region="<speech service region>"
temp_file_path = tempfile.gettempdir() + "/result.wav"
text = 'hello, this is a test'

def main(req: func.HttpRequest) -> func.HttpResponse:
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

    auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig()

    speech_synthesizer = speechsdk.SpeechSynthesizer(
        speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config,audio_config=None)

    result = speech_synthesizer.speak_text_async(text).get();
    if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            stream = speechsdk.AudioDataStream(result)
            stream.save_to_wav_file(temp_file_path)
    
    myclip = AudioFileClip(temp_file_path)

    return func.HttpResponse(str(myclip.duration))

The logic is simple getting a speech stream from speech service and save to a temp file and use AudioDataStream to get its duration.

Result:

If you still get some errors, you can get error details here:

Let me know if you have any further questions.