Search code examples
azuretext-to-speechazure-cognitive-services

Azure Text to Speech (Cognitive Services) in web app - how to stop it from outputting audio?


I'm using Azure Cognitive Services for Text to Speech in a web app.

I return the bytes to the browser and it works great, however on the server (or local machine) the speechSynthesizer.SpeakTextAsync(inp) line outputs the audio to the speaker.

Is there a way to turn this off, since this runs on a web server (and even if I ignore it, there's the delay while it outputs audio before sending back the data)

Here's my code ...

            var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);

            speechConfig.SpeechSynthesisVoiceName = "fa-IR-FaridNeural";
            speechConfig.OutputFormat = OutputFormat.Detailed;

            using (var speechSynthesizer = new SpeechSynthesizer(speechConfig))
            {
                // todo - how to disable it saying it here?
                var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(inp);
                return Convert.ToBase64String(speechSynthesisResult.AudioData);
            }

Solution

    • What you can do is add an audioconfig to the speechSynthesizer.

    • In this audioconfig object you can specify a file path to a .wav file which already exist on the server.

    • Whenever you run speaktextasyn instead of a speaker it will redirect the data to the .wav file.

    • This audio file you can read and perform your logic later.

    • Just add the following code before creating the speechSynthesizer object.

     var audioconfig = AudioConfig.FromWavFileOutput(filepath);
    

    here filepath is a location of the .wav file as a string.

    Complete code :

    string filepath = "<file path> " ; 
    var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion); 
    var audioconfig = AudioConfig.FromWavFileOutput(filepath);
    
    
                speechConfig.SpeechSynthesisVoiceName = "fa-IR-FaridNeural";
                speechConfig.OutputFormat = OutputFormat.Detailed;
    
                using (var speechSynthesizer = new SpeechSynthesizer(speechConfig, audioconfig))
                {
                    // todo - how to disable it saying it here?
                    var speechSynthesisResult = await speechSynthesizer.SpeakTextAsync(inp);
                    return Convert.ToBase64String(speechSynthesisResult.AudioData);
                }