Search code examples
javascriptaudiopcmdownsampling

Downsample PCM audio from 44100 to 8000


I've been working on a audio-recognize demo for some time, and the api needs me to pass an .wav file with sample rate of 8000 or 16000, so I have to downsample it. I have tried 2 algorithms as following. Though none of them solves the problem as I wish, there's some differences of the results and I hope that will make it more clear.

This is my first try, and it works fine when sampleRate % outputSampleRate = 0, however when outputSampleRate = 8000 or 1600, the outcome audio file is silent(which means the value of every element of the output array is 0):

function interleave(inputL){
  var compression = sampleRate / outputSampleRate;
  var length = inputL.length / compression;
  var result = new Float32Array(length);

  var index = 0,
  inputIndex = 0;

  while (index < length){
    result[index++] = inputL[inputIndex];
    inputIndex += compression;
  }
  return result;
}

So here's my second try which comes from a giant company, and it doesn't work too. What's more, when I set sampleRate % outputSampleRate = 0 it still output a silent file:

function interleave(e){
  var t = e.length;
  var n = new Float32Array(t),
    r = 0,
    i;
  for (i = 0; i < e.length; i++){
    n[r] = e[i];
    r += e[i].length;
  }
  sampleRate += 0.0;
  outputSampleRate += 0.0;
  var s = 0,
  o = sampleRate / outputSampleRate,
  u = Math.ceil(t * outputSampleRate / sampleRate),
  a = new Float32Array(u);
  for (i = 0; i < u; i++) {
    a[i] = n[Math.floor(s)];
    s += o;
  }

  return a
}

In case my settings were wrong, here's the encodeWAV function:

function encodeWAV(samples){
  var sampleBits = 16;
  var dataLength = samples.length*(sampleBits/8);

  var buffer = new ArrayBuffer(44 + dataLength);
  var view = new DataView(buffer);

  var offset = 0;

  /* RIFF identifier */
  writeString(view, offset, 'RIFF'); offset += 4;
  /* file length */
  view.setUint32(offset, 32 + dataLength, true); offset += 4;
  /* RIFF type */
  writeString(view, offset, 'WAVE'); offset += 4;
  /* format chunk identifier */
  writeString(view, offset, 'fmt '); offset += 4;
  /* format chunk length */
  view.setUint32(offset, 16, true); offset += 4;
  /* sample format (raw) */
  view.setUint16(offset, 1, true); offset += 2;
  /* channel count */
  view.setUint16(offset, outputChannels, true); offset += 2;
  /* sample rate */
  view.setUint32(offset, outputSampleRate, true); offset += 4;
  /* byte rate (sample rate * block align) */
  view.setUint32(offset, outputSampleRate*outputChannels*(sampleBits/8), true); offset += 4;
  /* block align (channel count * bytes per sample) */
  view.setUint16(offset, outputChannels*(sampleBits/8), true); offset += 2;
  /* bits per sample */
  view.setUint16(offset, sampleBits, true); offset += 2;
  /* data chunk identifier */
  writeString(view, offset, 'data'); offset += 4;
  /* data chunk length */
  view.setUint32(offset, dataLength, true); offset += 4;

  floatTo16BitPCM(view, offset, samples);

  return view;
}

It has confused me for a very long time, please let me know what I missed...

-----------------------------AFTER IT'S SOLVED--------------------------------

I'm glad it's running well now and here's the right edition of function interleave():

    function interleave(e){
      var t = e.length;
      sampleRate += 0.0;
      outputSampleRate += 0.0;
      var s = 0,
      o = sampleRate / outputSampleRate,
      u = Math.ceil(t * outputSampleRate / sampleRate),
      a = new Float32Array(u);
      for (i = 0; i < u; i++) {
        a[i] = e[Math.floor(s)];
        s += o;
      }

      return a;
    }

So you can see it's the variable that I passed to it was not of the proper type~ And thanks again for dear @jaket and other friends~ Though I figured it out myslf someway, they let me know the original things better~~~ :)


Solution

  • There is a lot more to sample rate conversion than just simply throwing samples away or inserting them.

    Lets take a simple case of downsampling by a factor of 2. (e.g. 44100->22050). A naive approach would be to just throw away every other sample. But imagine for a second that in the original 44.1kHz file there was a single sine wave present at 20khz. It is well within nyquist (fs/2=22050) for that sample rate. After you throw every other sample away it is still going to be there at 10kHz but now it will be above nyquist (fs/2=11025) and it will alias into your output signal. The final result is that you will have a big fat sine wave sitting at 8975 Hz!

    In order to avoid this aliasing during downsampling you need to first design a lowpass filter with a cutoff selected according to your decimation ratio. For the example above you would cutoff everything above 11025 first and then decimate.

    The flip side of the coin is called upsampling and interpolation. Say you want to increase the sample rate by a factor of 2. First you insert zeros between every input sample and then run an interpolation filter to compute values to replace the zeros using the surrounding samples.

    Rate changing usually involves some combination of decimation and interpolation - since both work by an integral numbers of samples. Take 48000->32000 as an example. The output/input ratio is 32000/48000 or 2/3. So you'd upsample 48000 by 2 to get 96000 and then downsample that by 3 to 32000. Another thing is that you can chain these processes together. So if you want to go from 48000->16000 you'd go up 3, down 2, down 2. Also, 44100 is particularly difficult. For example to move from 48000->44100 you need to go up 147, down 160 and you can't break it down to smaller terms.

    I'd suggest you find some code or a library to do this for you. What you need to look for is a polyphase filter or sample rate converter.