Search code examples
azureazure-speechazure-speech-studio

Where are files saved to in Blob Storage when exporting a Text To Speech file?


I am trying to get up to speed on using Azure Speech Studio to create mp3's for Text To Speech.

Speech Studio

It is straightforward to create and test the file. Then, I want to export it to an Azure Blob storage location so it can be used in an app. This is the dialog that is displayed: /Export Dialog

However, what is not clear is where it is actually being stored to. I see no setting in the Speech Service that says which Azure Blob Storage Account it is being saved to. After it successfully completes, I look in vain to find the storage link. So my workaround is to download it to my local harddrive, and then upload it to a known location. But it would be nice to be able to skip the download/upload step.


Solution

  • As per this MS Doc,

    The above option will save the audio files to the Audio library and to export these to the Blob storage, you need to integrate the storage with Azure speech service.

    This requires, creating BYOS (Building Your Own Storage) Speech resource. BYOS speech resource gives the option to associate a Storage account to the speech resource while creating it. You can check whether your subscription has the BYOS enabled or not by following powershell command referred from this doc.

    $azureSubscriptionId = "<your_subscription_id>"
    Set-AzContext -SubscriptionId $azureSubscriptionId 
    Get-AzProviderFeature -ListAvailable -ProviderNamespace "Microsoft.CognitiveServices" | where-object FeatureName -Match byox
    

    If not, you need to request for the BYOS access. You can go through this MS Doc which has step-by-step process on creating BYOS Speech service. Make sure you follow the given storage account rules in the documentation.

    You can use the below python code as a workaround if you want to continue with your Speech service. This code uses azure-cognitiveservices-speech with existing speech service and Blob storage credentials and converts the given texts to audio streams and then uploads to the required container. You can change the configurations of the speech as per your requirement.

    You need to make sure to install the below packages before running the code.

    azure-cognitiveservices-speech
    azure-storage-blob
    

    Code:

    import azure.cognitiveservices.speech as speechsdk
    from azure.storage.blob import BlobServiceClient
    import io
    
    # Function to generate text-to-speech and return audio data as a byte stream
    def text_to_speechstream(speech_key, service_region, text, voice="en-US-JennyNeural"):
        # Set up the Speech SDK configuration
        speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
        speech_config.speech_synthesis_voice_name = voice
    
        speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
    
        # Synthesize speech
        result = speech_synthesizer.speak_text_async(text).get()
        if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
            print("Speech stream created")
            audio_data = result.audio_data
            return audio_data
        else:
            print("Failed to create speech stream", result.reason)
    
    # Function to upload audio data stream directly to Azure Blob Storage
    def upload_to_blob(storage_account_name, container_name, audio_data, blob_name):
        # Construct the BlobServiceClient
        connection_string = f"<Blob storage connection string>"
        blob_service_client = BlobServiceClient.from_connection_string(connection_string)
        blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
    
    
        # Upload the audio data to the blob
        blob_client.upload_blob(audio_data, overwrite=True)
        print(f"Succesfully uploaded to Azure Blob Storage as: {blob_name}")
    
    # Azure speech resource credentials
    speech_key = "XXXX"
    service_region = "<region>"
    
    # Azure Blob Storage credentials
    storage_account_name = "<Blobstorage_name>"
    container_name = "<container_name>"
    
     # Input text and blob name
    text = "Hi, My name is Govindula Rakesh"
    blob_name = "output_audio.wav"  # file name in Blob Storage
    
    # Generate speech and get audio as a stream
    audio_data = text_to_speechstream(speech_key, service_region, text)
    
    # Upload the audio stream directly to Azure Blob Storage
    upload_to_blob(storage_account_name, container_name, audio_data, blob_name)
    

    Output:

    enter image description here

    Audio file uploaded to Blob storage:

    enter image description here