javascript html5-video html5-audio web-audio-api

How to generate HTML5 video volume level chart?

Given a plain web video of say 30s:

<video src="my-video.mp4"></video>

How could I generate its volume level chart?

volume|
 level|    ******
      |   *      *                           **
      |  *        *                         *  **
      |**          *      ***              *    
      |             ** * *   *            *
      +---------------*-*-----************------+--- time
      0                                        30s
          video is             and quiet 
          loud here            here

Note:

Plain JavaScript, please. No libraries.

Solution

There are several ways to do this depending on what the usage is.

For accuracy you could measure in conventional volumes and units such as RMS, LUFS/LKFS (K-weighted, loudness), dBFS (full-scale dB) and so forth.

The simple naive approach is to just plot the peaks of the waveform. You would be interested in the positive values only. To just get the peaks you would detect direction between two points and log the first point when the direction changes from upward to downwards (p0 > p1).

For all approaches you can finally apply some form of smoothing such as weighted moving average (example) or a generic smoothing algorithm to remove small peaks and changes, in case of RMS, dB etc. you would use a window size which can be combined with bin-smoothing (an average per segment).

To plot you will obtain the value for the current sample, assume it to be normalized and draw it as line or point to canvas scaled by plot area height.

Mini-discussion as to loading the source data

To address some of the questions in the comments; these are just off the top of my heads to give some pointers -

Since Web Audio API cannot do streaming on its own you have to load the entire file into memory and decode the audio track into a buffer.

Pros: works (analysis part), fast analysis when data is eventually ready, works fine for smaller files, if cached the URL can be used without re-downloading
Cons: long initial load time/bad UX, possible memory hog/not good for large files, audio is "detached" from video sync-wise, forces reuse of URL^*, if large and/or cache is not in place the file will have to be downloaded again/streamed, currently causes issues in some browsers/versions (see example below).

^*: There is always the option of storing the downloaded video as blob in IndexedDB (with its implications) and use an Object-URL with that blob to stream in the video element (may require MSE to work properly, haven't tried myself).

Plotting while streaming:

Pros: Cheap on memory/resources
Cons: the plot cannot be shown in full until the entire file has been played through, the user may skip/jump parts, may not finish

Side-loading a low-quality mono audio-only file:

Pros: audio can be loaded into memory independent of video file, results in good enough approximation for level use
Cons: can delay initial loading of video, may not be ready in time before video starts, will require additional processing in advance

Server-side plotting:

Pros: can be plotted when uploaded, can store raw plot data that is provided as meta-data when video is requested, low bandwidth, data ready when video starts (assuming data is representing averages over time-segments).
Cons: require infrastructure on server that can separate, analyze and produce the plot-data, depending on how the data is stored may require database modification.

I've might left out or missed some points, but it should give the general idea...

Example

This example measures conventional dB of a given window size per sample. The bigger the window size the smoother the result, but will also take more time to calculate.

Note that for simplicity in this example pixel position determines the dB window range. This may produce uneven gaps/overlaps depending on buffer size affecting the current sample value, but should work for the purpose demonstrated here. Also for simplicity I am scaling the dB reading by dividing it by 40, a somewhat arbitrary number here (ABS is just for the plotting and the way my brain worked (?) in the late night/early morning when I made this :) ).

I added bin/segment-smoothing in red on top to better show longer-term audio variations relevant to things such as auto-leveling.

I'm using a audio source here but you can plug in a video source instead as long as it contains an audio track format that can be decoded (aac, mp3, ogg etc.).

Besides from that, the example is just that, an example. It's not production code so take it for what it is worth. Make adjustments as needed.

(for some reason the audio won't play in Firefox v58beta, it will plot though. Audio plays in Chrome, FF58dev).

var ctx = c.getContext("2d"), ref, audio;
var actx = new (AudioContext || webkitAudioContext)();
var url = "//dl.dropboxusercontent.com/s/a6s1qq4lnwj46uj/testaudiobyk3n_lo.mp3";
ctx.font = "20px sans-serif";
ctx.fillText("Loading and processing...", 10, 50);
ctx.fillStyle = "#001730";

// Load audio
fetch(url, {mode: "cors"})
.then(function(resp) {return resp.arrayBuffer()})
.then(actx.decodeAudioData.bind(actx))
.then(function(buffer) {

  // Get data from channel 0 (you will want to measure all/avg.)
  var channel = buffer.getChannelData(0);

  // dB per window + Plot
  var points = [0];
  ctx.clearRect(0, 0, c.width, c.height);
  ctx.moveTo(x, c.height);
  for(var x = 1, i, v; x < c.width; x++) {
    i = ((x / c.width) * channel.length)|0;   // get index in buffer based on x
    v = Math.abs(dB(channel, i, 8820)) / 40;  // 200ms window, normalize
    ctx.lineTo(x, c.height * v);
    points.push(v);
  }
  ctx.fill();

  // smooth using bins
  var bins = 40;  // segments
  var range = (c.width / bins)|0;
  var sum;
  ctx.beginPath();
  ctx.moveTo(0,c.height);
  for(x = 0, v; x < points.length; x++) {
    for(v = 0, i = 0; i < range; i++) {
      v += points[x++];
    }
    sum = v / range;
    ctx.lineTo(x - (range>>1), sum * c.height); //-r/2 to compensate visually
  }
  ctx.lineWidth = 2;
  ctx.strokeStyle = "#c00";
  ctx.stroke();
  
  // for audio / progressbar only
  c.style.backgroundImage = "url(" + c.toDataURL() + ")";
  c.width = c.width;
  ctx.fillStyle = "#c00";
  audio = document.querySelector("audio");
  audio.onplay = start;
  audio.onended = stop;
  audio.style.display = "block";
});

// calculates RMS per window and returns dB
function dB(buffer, pos, winSize) {
  for(var rms, sum = 0, v, i = pos - winSize; i <= pos; i++) {
    v = i < 0 ? 0 : buffer[i];
    sum += v * v;
  }
  rms = Math.sqrt(sum / winSize);  // corrected!
  return 20 * Math.log10(rms);
}

// for progress bar (audio)
function start() {if (!ref) ref = requestAnimationFrame(progress)}
function stop() {cancelAnimationFrame(ref);ref=null}
function progress() {
  var x = audio.currentTime / audio.duration * c.width;
  ctx.clearRect(0,0,c.width,c.height);
  ctx.fillRect(x-1,0,2,c.height);
  ref = requestAnimationFrame(progress)
}

body {background:#536375}
#c {border:1px solid;background:#7b8ca0}

<canvas id=c width=640 height=300></canvas><br>
<audio style="display:none" src="//dl.dropboxusercontent.com/s/a6s1qq4lnwj46uj/testaudiobyk3n_lo.mp3" controls></audio>