Search code examples
javascripthtmlvideocanvas

JavaScript: Extract video frames reliably


I'm working on a client-side project which lets a user supply a video file and apply basic manipulations to it. I'm trying to extract the frames from the video reliably. At the moment I have a <video> which I'm loading selected video into, and then pulling out each frame as follows:

  1. Seek to the beginning
  2. Pause the video
  3. Draw <video> to a <canvas>
  4. Capture the frame from the canvas with .toDataUrl()
  5. Seek forward by 1 / 30 seconds (1 frame).
  6. Rinse and repeat

This is a rather inefficient process, and more specifically, is proving unreliable as I'm often getting stuck frames. This seems to be from it not updating the actual <video> element before it draws to the canvas.

I'd rather not have to upload the original video to the server just to split the frames, and then download them back to the client.

Any suggestions for a better way to do this are greatly appreciated. The only caveat is that I need it to work with any format the browser supports (decoding in JS isn't a great option).

Update:

In my particular case, this was eventually solved by using a stripped down ffmpeg codebase, compiled for asm.js using Emscripten. That was a tedious process which more than tripled the page size on the client, and was still somewhat slow. The other answers and discussion on this question are much better options in 2020 and onward.


Solution

  • [2021 update]: Since this question (and answer) has first been posted, things have evolved in this area, and it is finally time to make an update; the method that was exposed here went out-of-date, but luckily a few new or incoming APIs can help us better in extracting video frames:

    The most promising and powerful one, but still under development, with a lot of restrictions: WebCodecs

    This new API unleashes access to the media decoders and encoders, enabling us to access raw data from video frames (YUV planes), which may be a lot more useful for many applications than rendered frames; and for the ones who need rendered frames, the VideoFrame interface that this API exposes can be drawn directly to a <canvas> element or converted to an ImageBitmap, avoiding the slow route of the MediaElement.
    However there is a catch, apart from its current low support, this API needs that the input has been demuxed already.
    There are some demuxers online, for instance for MP4 videos GPAC's mp4box.js will help a lot.

    A full example can be found on the proposal's repo.

    The key part consists of

    const decoder = new VideoDecoder({
      output: onFrame, // the callback to handle all the VideoFrame objects
      error: e => console.error(e),
    });
    decoder.configure(config); // depends on the input file, your demuxer should provide it
    demuxer.start((chunk) => { // depends on the demuxer, but you need it to return chunks of video data
      decoder.decode(chunk); // will trigger our onFrame callback  
    })
    

    Note that we can even grab the frames of a MediaStream, thanks to MediaCapture Transform's MediaStreamTrackProcessor. This means that we should be able to combine HTMLMediaElement.captureStream() and this API in order to get our VideoFrames, without the need for a demuxer. However this is true only for a few codecs, and it means that we will extract frames at reading speed...
    Anyway, here is an example working on latest Chromium based browsers, with chrome://flags/#enable-experimental-web-platform-features switched on:

    const frames = [];
    const button = document.querySelector("button");
    const select = document.querySelector("select");
    const canvas = document.querySelector("canvas");
    const ctx = canvas.getContext("2d");
    
    button.onclick = async(evt) => {
      if (window.MediaStreamTrackProcessor) {
        let stopped = false;
        const track = await getVideoTrack();
        const processor = new MediaStreamTrackProcessor(track);
        const reader = processor.readable.getReader();
        readChunk();
    
        function readChunk() {
          reader.read().then(async({ done, value }) => {
            if (value) {
              const bitmap = await createImageBitmap(value);
              const index = frames.length;
              frames.push(bitmap);
              select.append(new Option("Frame #" + (index + 1), index));
              value.close();
            }
            if (!done && !stopped) {
              readChunk();
            } else {
              select.disabled = false;
            }
          });
        }
        button.onclick = (evt) => stopped = true;
        button.textContent = "stop";
      } else {
        console.error("your browser doesn't support this API yet");
      }
    };
    
    select.onchange = (evt) => {
      const frame = frames[select.value];
      canvas.width = frame.width;
      canvas.height = frame.height;
      ctx.drawImage(frame, 0, 0);
    };
    
    async function getVideoTrack() {
      const video = document.createElement("video");
      video.crossOrigin = "anonymous";
      video.src = "https://upload.wikimedia.org/wikipedia/commons/a/a4/BBH_gravitational_lensing_of_gw150914.webm";
      document.body.append(video);
      await video.play();
      const [track] = video.captureStream().getVideoTracks();
      video.onended = (evt) => track.stop();
      return track;
    }
    video,canvas {
      max-width: 100%
    }
    <button>start</button>
    <select disabled>
    </select>
    <canvas></canvas>

    The easiest to use, but still with relatively poor browser support, and subject to the browser dropping frames: HTMLVideoElement.requestVideoFrameCallback

    This method allows us to schedule a callback to whenever a new frame will be painted on the HTMLVideoElement.
    It is higher level than WebCodecs, and thus may have more latency, and moreover, with it we can only extract frames at reading speed.

    const frames = [];
    const button = document.querySelector("button");
    const select = document.querySelector("select");
    const canvas = document.querySelector("canvas");
    const ctx = canvas.getContext("2d");
    
    button.onclick = async(evt) => {
      if (HTMLVideoElement.prototype.requestVideoFrameCallback) {
        let stopped = false;
        const video = await getVideoElement();
        const drawingLoop = async(timestamp, frame) => {
          const bitmap = await createImageBitmap(video);
          const index = frames.length;
          frames.push(bitmap);
          select.append(new Option("Frame #" + (index + 1), index));
    
          if (!video.ended && !stopped) {
            video.requestVideoFrameCallback(drawingLoop);
          } else {
            select.disabled = false;
          }
        };
        // the last call to rVFC may happen before .ended is set but never resolve
        video.onended = (evt) => select.disabled = false;
        video.requestVideoFrameCallback(drawingLoop);
        button.onclick = (evt) => stopped = true;
        button.textContent = "stop";
      } else {
        console.error("your browser doesn't support this API yet");
      }
    };
    
    select.onchange = (evt) => {
      const frame = frames[select.value];
      canvas.width = frame.width;
      canvas.height = frame.height;
      ctx.drawImage(frame, 0, 0);
    };
    
    async function getVideoElement() {
      const video = document.createElement("video");
      video.crossOrigin = "anonymous";
      video.src = "https://upload.wikimedia.org/wikipedia/commons/a/a4/BBH_gravitational_lensing_of_gw150914.webm";
      document.body.append(video);
      await video.play();
      return video;
    }
    video,canvas {
      max-width: 100%
    }
    <button>start</button>
    <select disabled>
    </select>
    <canvas></canvas>

    For your Firefox users, Mozilla's non-standard HTMLMediaElement.seekToNextFrame()

    As its name implies, this will make your <video> element seek to the next frame.
    Combining this with the seeked event, we can build a loop that will grab every frame of our source, faster than reading speed (yeah!).
    But this method is proprietary, available only in Gecko based browsers, not on any standard tracks, and probably gonna be removed in the future when they'll implement the methods exposed above.
    But for the time being, it is the best option for Firefox users:

    const frames = [];
    const button = document.querySelector("button");
    const select = document.querySelector("select");
    const canvas = document.querySelector("canvas");
    const ctx = canvas.getContext("2d");
    
    button.onclick = async(evt) => {
      if (HTMLMediaElement.prototype.seekToNextFrame) {
        let stopped = false;
        const video = await getVideoElement();
        const requestNextFrame = (callback) => {
          video.addEventListener("seeked", () => callback(video.currentTime), {
            once: true
          });
          video.seekToNextFrame();
        };
        const drawingLoop = async(timestamp, frame) => {
          if(video.ended) {
            select.disabled = false;
            return; // FF apparently doesn't like to create ImageBitmaps
                    // from ended videos...
          }
          const bitmap = await createImageBitmap(video);
          const index = frames.length;
          frames.push(bitmap);
          select.append(new Option("Frame #" + (index + 1), index));
    
          if (!video.ended && !stopped) {
            requestNextFrame(drawingLoop);
          } else {
            select.disabled = false;
          }
        };
        requestNextFrame(drawingLoop);
        button.onclick = (evt) => stopped = true;
        button.textContent = "stop";
      } else {
        console.error("your browser doesn't support this API yet");
      }
    };
    
    select.onchange = (evt) => {
      const frame = frames[select.value];
      canvas.width = frame.width;
      canvas.height = frame.height;
      ctx.drawImage(frame, 0, 0);
    };
    
    async function getVideoElement() {
      const video = document.createElement("video");
      video.crossOrigin = "anonymous";
      video.src = "https://upload.wikimedia.org/wikipedia/commons/a/a4/BBH_gravitational_lensing_of_gw150914.webm";
      document.body.append(video);
      await video.play();
      return video;
    }
    video,canvas {
      max-width: 100%
    }
    <button>start</button>
    <select disabled>
    </select>
    <canvas></canvas>

    The least reliable, that did stop working over time: HTMLVideoElement.ontimeupdate

    The strategy pause - draw - play - wait for timeupdate used to be (in 2015) a quite reliable way to know when a new frame got painted to the element, but since then, browsers have put serious limitations on this event which was firing at great rate and now there isn't much information we can grab from it...

    I am not sure I can still advocate for its use, I didn't check how Safari (which is currently the only one without a solution) handles this event (their handling of medias is very weird for me), and there is a good chance that a simple setTimeout(fn, 1000 / 30) loop is actually more reliable in most of the cases.