I am trying to get up to speed on using Azure Speech Studio to create mp3's for Text To Speech.
It is straightforward to create and test the file. Then, I want to export it to an Azure Blob storage location so it can be used in an app. This is the dialog that is displayed:
However, what is not clear is where it is actually being stored to. I see no setting in the Speech Service that says which Azure Blob Storage Account it is being saved to. After it successfully completes, I look in vain to find the storage link. So my workaround is to download it to my local harddrive, and then upload it to a known location. But it would be nice to be able to skip the download/upload step.
As per this MS Doc,
The above option will save the audio files to the Audio library and to export these to the Blob storage, you need to integrate the storage with Azure speech service.
This requires, creating BYOS (Building Your Own Storage) Speech resource. BYOS speech resource gives the option to associate a Storage account to the speech resource while creating it. You can check whether your subscription has the BYOS enabled or not by following powershell command referred from this doc.
$azureSubscriptionId = "<your_subscription_id>"
Set-AzContext -SubscriptionId $azureSubscriptionId
Get-AzProviderFeature -ListAvailable -ProviderNamespace "Microsoft.CognitiveServices" | where-object FeatureName -Match byox
If not, you need to request for the BYOS access. You can go through this MS Doc which has step-by-step process on creating BYOS Speech service. Make sure you follow the given storage account rules in the documentation.
You can use the below python code as a workaround if you want to continue with your Speech service. This code uses azure-cognitiveservices-speech
with existing speech service and Blob storage credentials and converts the given texts to audio streams and then uploads to the required container. You can change the configurations of the speech as per your requirement.
You need to make sure to install the below packages before running the code.
import azure.cognitiveservices.speech as speechsdk
from azure.storage.blob import BlobServiceClient
import io
# Function to generate text-to-speech and return audio data as a byte stream
def text_to_speechstream(speech_key, service_region, text, voice="en-US-JennyNeural"):
# Set up the Speech SDK configuration
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
speech_config.speech_synthesis_voice_name = voice
speech_synthesizer = speechsdk.SpeechSynthesizer(speech_config=speech_config)
# Synthesize speech
result = speech_synthesizer.speak_text_async(text).get()
if result.reason == speechsdk.ResultReason.SynthesizingAudioCompleted:
print("Speech stream created")
audio_data = result.audio_data
return audio_data
print("Failed to create speech stream", result.reason)
# Function to upload audio data stream directly to Azure Blob Storage
def upload_to_blob(storage_account_name, container_name, audio_data, blob_name):
# Construct the BlobServiceClient
connection_string = f"<Blob storage connection string>"
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)
# Upload the audio data to the blob
blob_client.upload_blob(audio_data, overwrite=True)
print(f"Succesfully uploaded to Azure Blob Storage as: {blob_name}")
# Azure speech resource credentials
speech_key = "XXXX"
service_region = "<region>"
# Azure Blob Storage credentials
storage_account_name = "<Blobstorage_name>"
container_name = "<container_name>"
# Input text and blob name
text = "Hi, My name is Govindula Rakesh"
blob_name = "output_audio.wav" # file name in Blob Storage
# Generate speech and get audio as a stream
audio_data = text_to_speechstream(speech_key, service_region, text)
# Upload the audio stream directly to Azure Blob Storage
upload_to_blob(storage_account_name, container_name, audio_data, blob_name)
Audio file uploaded to Blob storage: