Search code examples
twilioazure-cognitive-services

Stream audio back to Twilio from Azure text to speech


I am trying to send an audio stream back to Twilio via a WebSocket and let Twilio play that voice to the caller. I have set up my bidirectional WebSocket connection following this guide. I can receive the call stream but even though I am sending the audio back, I cannot hear any response.

This is how i am creating the bidirectional stream:

const response = new VoiceResponse();
const connect = response.connect();
const stream = connect.stream({
  url: "wss://<my_websocket_address>",
});
response.say("Disconnecting call");
res.type("text/xml");
res.send(response.toString());

this is how i am trying to convert the text to audio using azure-cognitive-services:

const textToSpeech = async (text) => {
  return new Promise((resolve, reject) => {
    const speechConfig = sdk.SpeechConfig.fromSubscription(
      constants.subscriptionkey,
      constants.region
    );
    speechConfig.speechSynthesisLanguage = "en-US";
    speechConfig.speechSynthesisVoiceName = "en-US-AvaMultilingualNeural";
    speechConfig.speechSynthesisOutputFormat = sdk.SpeechSynthesisOutputFormat.Raw8Khz8BitMonoMULaw;

    let audioConfig = null;
    const synthesizer = new sdk.SpeechSynthesizer(speechConfig, audioConfig);

     synthesizer.speakTextAsync(
       text,
       (result) => {
         const { audioData } = result;

         synthesizer.close();
         const bufferStream = new PassThrough();
         bufferStream.end(Buffer.from(audioData).toString("base64"));
         console.log("TTS DONE");
         resolve(bufferStream);
       }
      },
      (error) => {
        synthesizer.close();
        reject(error);
      }
    ); 
  });
};

and this is how i am finally trying to send the audio back:

let payload = await tts.textToSpeech("Hello, how can i help you?")
var json = {
  "event": "media",
  "streamSid": "MZ058f55e473ebabd11f57552bc9952861",
  "media": {
    "payload": payload
  }
}
this.connection.send(json)

I understand that I need to send a base64 encoded mulaw audio to Twilio but i think I am aready doing that.

Can someone please suggest where I am going wrong? Also, is there any way to check the sent WebSocket messages on the Twilio dashboard?


Solution

  • Twilio Developer Evangelist here.

    You need to stringify your JSON object that you're sending to twilio.

    this.connection.send(JSON.stringify(json))