Search code examples
c#amazon-web-servicestext-to-speechamazon-polly

AWS - Amazon Polly Text To Speech


I have doubts about the "text-to-speech" Amazon Polly service.
I've integrated this service in my chatbot, in order to describe vocally what the bot writes to the user in chat.
It works pretty well, but I don't know if it is possible to stop the voice early, before she (I chose a female voice) finishes speaking. Sometimes I need to go further in the conversation and I don't want to listen until the end of the sentence.

This is the code used for the integration:

//Html side
function textToSpeech(text) {
  $.ajax({
    type: 'GET',
    url: '/Chat/TextToSpeech?text=' + text,
    
    cache: false,
    success: function (result) {
    
      var audio = document.getElementById('botvoice');
      $("#botvoice").attr("src", "/Audios/" + result);
      audio.load();                 
      audio.play();
    }
  });
}

Controller side:

public ActionResult TextToSpeech(string text)
{
    string filename = "";
    try
    {
        AWSCredentials credentials = new StoredProfileAWSCredentials("my_credential");
        AmazonPollyClient client = new AmazonPollyClient(credentials, Amazon.RegionEndpoint.EUWest1);

        // Create describe voices request.
        DescribeVoicesRequest describeVoicesRequest = new DescribeVoicesRequest();
        // Synchronously ask Amazon Polly to describe available TTS voices.
        DescribeVoicesResponse describeVoicesResult = client.DescribeVoices(describeVoicesRequest);
        List<Voice> voices = describeVoicesResult.Voices;


        // Create speech synthesis request.
        SynthesizeSpeechRequest synthesizeSpeechPresignRequest = new SynthesizeSpeechRequest();
        // Text
        synthesizeSpeechPresignRequest.Text = text;
        // Select voice for synthesis.
        synthesizeSpeechPresignRequest.VoiceId = voices[18].Id;
        // Set format to MP3.
        synthesizeSpeechPresignRequest.OutputFormat = OutputFormat.Mp3;
        // Get the presigned URL for synthesized speech audio stream.

        string current_dir = AppDomain.CurrentDomain.BaseDirectory;
        filename = CalculateMD5Hash(text) + ".mp3";
        var path_audio = current_dir + @"\Audios\" + filename;

        var presignedSynthesizeSpeechUrl = client.SynthesizeSpeechAsync(synthesizeSpeechPresignRequest).GetAwaiter().GetResult();

        FileStream wFile = new FileStream(path_audio, FileMode.Create);
        presignedSynthesizeSpeechUrl.AudioStream.CopyTo(wFile);
        wFile.Close();
    }
    catch (Exception ex)
    {
        filename = ex.ToString();
    }

    return Json(filename, JsonRequestBehavior.AllowGet);
}

An input text is present in my chat (obviously) for writing and sending (by pressing ENTER on the keyboard) the question to the bot. I tried to put the command audio.src="" in the handler, and she stops to talk but the chat still remains blocked... It seems like it waits the end of the audio stream. I have to refresh the page to see new messages and responses.

Is there any Amazon function that I can call with a particular parameter set, in order to notify the service that I want to stop and clear the audio stream?


Solution

  • Amazon Polly returns a .mp3 file. It is not responsible for playing the audio file.

    Any difficulties you are experiencing playing/stopping the audio would be the result of the code you are using to play an MP3 audio file. It has nothing to do with the Amazon Polly service itself.