Search code examples
javascriptaudioweb-audio-api

How to get a smaller piece of audio from larger audio captured with browser's Web Audio Api


I'm making a speech-to-text tool. I'm capturing audio in real time (using Web audio api from Chrome) and sending it to a server to convert the audio to text.

I'd like to extract pieces of the whole audio cause I only want to send sentences, avoiding silences. (cause the api I use has a cost). The problem is that I don't know how to convert the whole audio into pieces.

I was using MediaRecorder to capture the audio

    // recording 

    this.recorder = new MediaRecorder(stream)
    this.recorder.ondataavailable = async (e) => {
      const buffer = await e.data.arrayBuffer()
      this.chunks.add(new Uint8Array(buffer))
    }
    this.recorder.start(1000)

Now I have in this.chunks I have an array of buffers indexed by second.
If I try to reproduce the whole audio file by passing all captured buffer, it is able to decode it and reproduce it correctly:

    // reproduce the whole audio: <- this works
    const combinedChunks = this.chunks.reduce((prev, chunk) => [...prev,...chunk], [])
    const arrChunks = new Uint8Array(combinedChunks)
    this.repAudioContext = new AudioContext()
    this.repAudioBuffer = await this.repAudioContext.decodeAudioData(
      arrChunks.buffer
    )

    this.repSourceNode = this.repAudioContext.createBufferSource()
    this.repSourceNode.buffer = this.repAudioBuffer

    this.repSourceNode.connect(this.repAudioContext.destination)
    this.repSourceNode.start()

That works ^, because I'm using all of the pieces. But since I want to extract pieces of the audio, I want to be able to select only the buffer pieces I want to reproduce. And I can't do that. If I extract the first piece of audio, it stops working and I get: decodeAudioData - Unable to decode audio data.

    // reproduce a part of the audio captured: <- this won't work
    const combinedChunks = this.chunks.slice(1).reduce((prev, chunk) => [...prev,...chunk], []) // <- skipping first chunk
    const arrChunks = new Uint8Array(combinedChunks)
    this.repAudioContext = new AudioContext()
    this.repAudioBuffer = await this.repAudioContext.decodeAudioData(
      arrChunks.buffer
    )

    this.repSourceNode = this.repAudioContext.createBufferSource()
    this.repSourceNode.buffer = this.repAudioBuffer

    this.repSourceNode.connect(this.repAudioContext.destination)
    this.repSourceNode.start()

I understand this might be because in the first chunk there are headers or other metadata of the captured audio. But can't find a way of doing this.

Can anyone give me some advice? is there a different api I should be using? What's the proper way of extracting a smaller piece of audio from a larger one that I can reproduce and save as a file?


Solution

  • I've found the answer to my own question, I was using the wrong approach.

    What I need to use to get the raw audio inputs and be able to manipulate them is the AudioWorkletProcessor.

    This video helped me to understand the theory behind:

    https://www.youtube.com/watch?v=g1L4O1smMC0

    And this article helped me understand how to make use of it: https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Using_AudioWorklet