Search code examples
javascriptaudio

How to decode binary audio data?


I'm still new to web development and I'm making a chatbot, but I want to run the responses through googles text to speech first and then play the sound on the client. So client sends message to server -> server creates a response -> server sends message to google -> gets audio data back -> sends it to client -> client plays it. I got all the way to the last step but now I'm out of my depth.

I've been doing some googling and there seems to be a lot of info on playing audio from binary data, audio contexts and so on and I've created a function but it doesn't work. Here's what I've done:

export const SendMessage: Client.Common.Footer.API.SendMessage = async message => {
    const baseRoute = process.env.REACT_APP_BASE_ROUTE;
    const port = process.env.REACT_APP_SERVER_PORT;
    const audioContext = new AudioContext();
    let audio: any;
    const url = baseRoute + ":" + port + "/ChatBot";
    console.log("%c Sending post request...", "background: #1fa67f; color: white", url, JSON.stringify(message));
    let responseJson = await fetch(url, {
        method: "POST",
        mode: "cors",
        headers: {
            Accept: "application/json",
            "Content-Type": "application/json"
        },
        body: JSON.stringify(message)
    });
    let response = await responseJson.json();
    await audioContext.decodeAudioData(
        new ArrayBuffer(response.data.audio.data),
        buffer => {
            audio = buffer;
        },
        error => console.log("===ERROR===\n", error)
    );
    const source = audioContext.createBufferSource();
    source.buffer = audio;
    source.connect(audioContext.destination);
    source.start(0);
    console.log("%c Post response:", "background: #1fa67f; color: white", url, response);
};

This function sends sends the message to the server and gets back the response message and audio data. I do have some sort of binary data in my response.data.audio.data but I'm getting an error saying that audio data can't be decoded (the error in the decodeAudioData method is firing). I know the data is valid because, on my server, I use the following code to turn it into an mp3 file which plays fine:

const writeFile = util.promisify(fs.writeFile);
await writeFile("output/TTS.mp3", response.audioContent, "binary");

I have almost no knowledge of how binary data is handled here and what could be going wrong. Do I need to specify further parameters to decode the binary data correctly? How do I know which? I would like to understand what's actually happening here and not just copy paste some solution.

EDIT:

So it seems the array buffer isn't being created properly. If I run this code:

    console.log(response);
    const audioBuffer = new ArrayBuffer(response.data.audio.data);
    console.log("===audioBuffer===", audioBuffer);
    audio = await audioContext.decodeAudioData(audioBuffer);

The response comes out as:

{message: "Message successfully sent.", status: 1, data: {…}}
    message: "Message successfully sent."
    status: 1
    data:
        message: "Sorry, I didn't understand your question, try rephrasing."
        audio:
            type: "Buffer"
            data: Array(14304)
                [0 … 9999]
                [10000 … 14303]
                length: 14304
            __proto__: Array(0)
        __proto__: Object
    __proto__: Object
__proto__: Object

but the buffer logs as this:

===audioBuffer=== 
ArrayBuffer(0) {}
    [[Int8Array]]: Int8Array []
    [[Uint8Array]]: Uint8Array []
    [[Int16Array]]: Int16Array []
    [[Int32Array]]: Int32Array []
    byteLength: 0
__proto__: ArrayBuffer

Clearly JS doesn't understand the format in my response object, but that's what I got from google's text to speech API. Maybe I'm sending it wrong from my server? Like I said before, on my server the following code turns that array into a mp3 file:

    const writeFile = util.promisify(fs.writeFile);
    await writeFile("output/TTS.mp3", response.audioContent, "binary");
    return response.audioContent;

Where response.audioContent is also sent to the client like so:


//in index.ts
...
const app = express();
app.use(bodyParser.json());
app.use(cors(corsOptions));

app.post("/TextToSpeech", TextToSpeechController);
...
//textToSpeech.ts
export const TextToSpeechController = async (req: Req<Server.API.TextToSpeech.RequestQuery>, res: Response) => {
    let response: Server.API.TextToSpeech.ResponseBody = {
        message: null,
        status: CONSTANTS.STATUS.ERROR,
        data: undefined
    };
    try {
        console.log("===req.body===", req.body);
        if (!req.body) throw new Error("No message recieved");
        const audio = await TextToSpeech({ message: req.body.message });
        response = {
            message: "Audio file successfully created!",
            status: CONSTANTS.STATUS.SUCCESS,
            data: audio
        };
        res.send(response);
    } catch (error) {
        response = {
            message: "Error converting text to speech: " + error.message,
            status: CONSTANTS.STATUS.ERROR,
            data: undefined
        };
        res.json(response);
    }
};
...

I find it weird that, on my server, response.audioContent logs as:

===response.audioContent=== <Buffer ff f3 44 c4 00 00 00 03 48 01 40 00 00 f0 
a3 0f fc 1a 00 11 e1 48 7f e0 e0 87 fc b8 88 40 1c 7f e0 4c 03 c1 d9 ef ff ec 
3e 4c 02 c7 88 7f ff f9 ff ff ... >

But, on the client, it's

audio:
            type: "Buffer"
            data: Array(14304)
                [0 … 9999]
                [10000 … 14303]
                length: 14304
            __proto__: Array(0)
        __proto__: Object

I tried passing response.data, response.data.audio and response.data.audio.data to new ArrayBuffer() but all result in the same empty buffer.


Solution

  • A couple of things in your code, you can't populate the ArrayBuffer via that constructor function. Your call to decodeAudioData is async and will cause audio to be undefined. I would recommend that you update the call to decodeAudioData to the newer promised based function.

    EDIT: You must be doing something strange with your call to Google Text to Speech and the returned result for the previous example that I posted not to work, because whether I use an mp3 or the response from Google, they both work, once passed the correct reference of the buffer.

    The fact you can get it to work with the mp3 file and not the text to speech may be that you are not referencing the correct property in the result returned from the call to google's api. The response from the api call is an Array so make sure that you are referencing the 0 index in the result array (see textToSpeech.js below).

    Full application described below.

    // textToSpeech.js
    const textToSpeech = require('@google-cloud/text-to-speech');
    const client = new textToSpeech.TextToSpeechClient();
    
    module.exports = {
        say: async function(text) {
            const request = {
                input: { text },
                voice: { languageCode: 'en-US', ssmlGender: 'NEUTRAL' },
                audioConfig: { audioEncoding: 'MP3' },
              };
            const response = await client.synthesizeSpeech(request);
            return response[0].audioContent    
        }
    }
    
    // server.js
    const express = require('express');
    const path = require('path');
    const app = express();
    const textToSpeechService = require('./textToSpeech');
    
    app.get('/', (req, res) => {
        res.sendFile(path.join(__dirname + '/index.html'));
    });
    
    app.get('/speech', async (req, res) => {
        const buffer = await textToSpeechService.say('hello world');
        res.json({
            status: `y'all good :)`,
            data: buffer
        })
    });
    
    app.listen(3000);
    
    // index.html
    <!DOCTYPE html>
    <html>
        <script>
            async function play() {
                const audioContext = new AudioContext();
                const request = await fetch('/speech');
                const response = await request.json();
                const arr = Uint8Array.from(response.data.data)
                const audio = await audioContext.decodeAudioData(arr.buffer);
                const source = audioContext.createBufferSource();
                source.buffer = audio;
                source.connect(audioContext.destination);
                source.start(0);
            }
        </script>
        <body>
            <h1>Hello Audio</h1>
            <button onclick="play()">play</button>
        </body>
    </html>