javascript node.js amazon-web-services chatbot amazon-lex

How can I play an audioStream returned by the Amazon Lex SDK when recognising an utterance?

I have created a chatbot with Amazon Lex with a Node.js Rest API using the official docs.

I'm sending a RecognizeUtteranceCommand & receiving an audioStream in the response.

How can I play this audioStream?

There's no examples in the documentation about playing the audio stream.

My code:

const { LexRuntimeV2Client, RecognizeUtteranceCommand} = require("@aws-sdk/client-lex-runtime-v2");
const client = new LexRuntimeV2Client({ region: 'us-east-1' });


getResponse = async () => {
        const lexparams =
    {
        "botAliasId": "XXXXXXXXXXX",
        "botId": "XXXXXXXXXX",
        "localeId": "en_US",
        "inputStream": <blob XX XX XX ....>,
        "requestContentType": "audio/x-l16; sample-rate=16000; channel-count=1",
        "sessionId": "XXXXXXXXXXXX",
        "responseContentType": "audio/mpeg"
    }

    const command = new RecognizeUtteranceCommand  (lexparams);
    const response = await client.send(command);
    console.log(response);
}

getResponse();

The response:

"messages": [
    {
        "content": "How was the day ?",
        "contentType": "PlainText"
    }
    ],
audioStream: Http2Stream {
    id: 1,
    closed: false,
    destroyed: false,
    state: {
    state: 5,
    weight: 16,
    sumDependencyWeight: 0,
    localClose: 1,
    remoteClose: 0,
    localWindowSize: 65535
    },
    readableState: ReadableState {
    objectMode: false,
    highWaterMark: 16384,
    buffer: BufferList { head: null, tail: null, length: 0 },
    length: 0,
    pipes: [],
    flowing: null,
    ended: false,
    endEmitted: false,
    reading: false,
    constructed: true,
    sync: true,
    needReadable: false,
    emittedReadable: false,
    readableListening: false,
    resumeScheduled: false,
    errorEmitted: false,
    emitClose: true,
    autoDestroy: false,
    destroyed: false,
    errored: null,
    closed: false,
    closeEmitted: false,
    defaultEncoding: 'utf8',
    awaitDrainWriters: null,
    multiAwaitDrain: false,
    readingMore: true,
    dataEmitted: false,
    decoder: null,
    encoding: null,
    [Symbol(kPaused)]: null
    },
    writableState: WritableState {
    objectMode: false,
    highWaterMark: 16384,
    finalCalled: true,
    needDrain: true,
    ending: true,
    ended: true,
    finished: true,
    destroyed: false,
    decodeStrings: false,
    defaultEncoding: 'utf8',
    length: 0,
    writing: false,
    corked: 0,
    sync: false,
    bufferProcessing: false,
    onwrite: [Function: bound onwrite],
    writecb: null,
    writelen: 0,
    afterWriteTickInfo: null,
    buffered: [],
    bufferedIndex: 0,
    allBuffers: true,
    allNoop: true,
    pendingcb: 0,
    constructed: true,
    prefinished: true,
    errorEmitted: false,
    emitClose: true,
    autoDestroy: false,
    errored: null,
    closed: false,
    closeEmitted: false,
    [Symbol(kOnFinished)]: []
    }
}

Solution

Defined as part of the RecognizeUtteranceCommandOutput interface, audioStream is of type:

Readable | ReadableStream | Blob

In other words, you ultimately have a stream of byte data in the audio/mpeg format.

It is up to you how to consume this stream, but for playing this audio on the client's device, I would:

Return response.audioStream back from the endpoint
Create a Uint8Array from the audio stream & subsequently, a Blob instance with the MIME type of audio/mpeg
URL encode the Blob instance and attach it to the src attribute of a HTML <audio> element
Call the .play() method on the audio element

This should work:

var stream = new Uint8Array(audioStream);
var audioBlob = new Blob([stream], { type: 'audio/mpeg' });
var audioElement = document.createElement('audio');
var objectUrl = window.URL.createObjectURL(audioBlob);

audioElement.src = objectUrl;
audioElement.play();