Search code examples
azurememorytext-to-speechmoviepy

Movie py : importing audio from text-to-speech in memory


I'm trying to use text-to-speech from Azure in combination with movie.py to create the audio stream for a video.

result = synthesizer.speak_ssml_async(xml_string).get()
stream = AudioDataStream(result)

The output of this process is:

<azure.cognitiveservices.speech.AudioDataStream at 0x2320cb87ac0>

However, movie.py is not able to import this with the following command:

audioClip = AudioFileClip(stream)

This is giving me the error:

AudioDataStream' object has no attribute 'endswith'

Do I need to convert the Azure Stream to .wav? How do I do that? I need to do the entire process without writing .wav files locally (e.g. stream.save_to_wav_file) but just using the memory streams.

Can someone spot a light, please?


Solution

  • I write a HTTP trigger Python function for you, just try the code below :

    import azure.functions as func
    import azure.cognitiveservices.speech as speechsdk
    import tempfile
    import imageio
    imageio.plugins.ffmpeg.download()
    from moviepy.editor import AudioFileClip
    
    
    
    speech_key="<speech service key>"
    service_region="<speech service region>"
    temp_file_path = tempfile.gettempdir() + "/result.wav"
    text = 'hello, this is a test'
    
    def main(req: func.HttpRequest) -> func.HttpResponse:
        speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    
        auto_detect_source_language_config = speechsdk.languageconfig.AutoDetectSourceLanguageConfig()
    
        speech_synthesizer = speechsdk.SpeechSynthesizer(
            speech_config=speech_config, auto_detect_source_language_config=auto_detect_source_language_config,audio_config=None)
    
        result = speech_synthesizer.speak_text_async(text).get();
        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
                stream = speechsdk.AudioDataStream(result)
                stream.save_to_wav_file(temp_file_path)
        
        myclip = AudioFileClip(temp_file_path)
    
        return func.HttpResponse(str(myclip.duration))
    

    The logic is simple getting a speech stream from speech service and save to a temp file and use AudioDataStream to get its duration.

    Result: enter image description here

    If you still get some errors, you can get error details here: enter image description here

    Let me know if you have any further questions.