Search code examples
javascriptpython-3.xnumpyflaskhtml5-audio

Send Audio data represent as numpy array from python to Javascript


I have a TTS (text-to-speech) system that produces audio in numpy-array form whose data type is np.float32. This system is running in the backend and I want to transfer the data from the backend to the frontend to be played when a certain event happens.

The obvious solution for this problem is to write the audio data on disk as a wav file and then pass the path to the frontend to be played. This worked fine, but I don't want to do that for administrative reasons. I just want to transfer only the audio data (numpy array) to the frontend.

What I have done till now is the following:

backend

text = "Hello"
wav, sr = tts_model.synthesize(text)
data = {"snd", wav.tolist()}
flask_response = app.response_class(response=flask.json.dumps(data),
                                    status=200,
                                    mimetype='application/json' )
# then return flask_response

frontend

// gets wav from backend
let arrayData = new Float32Array(wav);
let blob = new Blob([ arrayData ]);
let url = URL.createObjectURL(blob);
let snd = new Audio(url);
snd.play()

That what I have done till now, but the JavaScript throws the following error:

Uncaught (in promise) DOMException: Failed to load because no supported source was found.

This is the gist of what I'm trying to do. I'm so sorry, you can't repreduce the error as you don't have the TTS system, so this is an audio file generated by it which you can use to see what I'm doing wrong.

Other things I tried:

  • Change the audio datatype to np.int8, np.int16 to be casted in the JavaScript by Int8Array() and int16Array() respectively.
  • tried different types when creating the blob such as {"type": "application/text;charset=utf-8;"} and {"type": "audio/ogg; codecs=opus;"}.

I have been struggling in this issue for so long, so any help is appriciated !!


Solution

  • Your sample as is does not work out of the box. (Does not play)

    However with:

    • StarWars3.wav: OK. retrieved from cs.uic.edu
    • your sample encoded in PCM16 instead of PCM32: OK (check the wav metadata)

    Flask

    from flask import Flask, render_template, json
    import base64
    
    app = Flask(__name__)
    
    with open("sample_16.wav", "rb") as binary_file:
        # Read the whole file at once
        data = binary_file.read()
        wav_file = base64.b64encode(data).decode('UTF-8')
    
    @app.route('/wav')
    def hello_world():
        data = {"snd": wav_file}
        res = app.response_class(response=json.dumps(data),
            status=200,
            mimetype='application/json')
        return res
    
    @app.route('/')
    def stat():
        return render_template('index.html')
    
    if __name__ == '__main__':
        app.run(debug = True)
    

    js

    
      <audio controls></audio>
      <script>
        ;(async _ => {
          const res = await fetch('/wav')
          let {snd: b64buf} = await res.json()
          document.querySelector('audio').src="data:audio/wav;base64, "+b64buf;
        })()
      </script>
    

    Original Poster Edit

    So, what I ended up doing before (using this solution) that solved my problem is to:

    • First, change the datatype from np.float32 to np.int16:
    wav = (wav * np.iinfo(np.int16).max).astype(np.int16)
    
    • Write the numpy array into a temporary wav file using scipy.io.wavfile:
    from scipy.io import wavfile
    wavfile.write(".tmp.wav", sr, wav)
    
    • Read the bytes from the tmp file:
    # read the bytes
    with open(".tmp.wav", "rb") as fin:
        wav = fin.read()
    
    • Delete the temporary file
    import os
    os.remove(".tmp.wav")