Search code examples
javascriptweb-worker

Web worker out of memory when processing large array


I'm building an app which among other things has the ability to upload files to an existing API. This API takes both file metadata and contents in a JSON object, so I need to convert the binary contents of the files to base64 encoded strings.

Since this is a potentially heavy operation, I moved the functionality into a web worker. The worker takes in an ArrayBuffer object with the binary file contents (returned from FileReader.readAsArrayBuffer()), and returns a base64 encoded string.

This works fine for smaller files, but for the largest files I need to support (~40 MB) this causes out of memory exceptions for my worker (8007000E in Internet Explorer). On rare occasions it goes through, but most of the time the worker just dies. The same happened before moving it into the worker, except then the entire browser page crashed (both in IE and Chrome). Chrome seems to be a bit more resilient to the memory strain in workers than IE is, but I still have to make it work properly in IE (10+).

My worker:

onmessage = e => {
  const bytes = new Uint8Array(e.data);
  const l = bytes.length;
  const chars = new Array(l);
  for (let i = 0, j = l - 1; i <= j; ++i, --j) {
    chars[i] = String.fromCharCode(bytes[i]);
    chars[j] = String.fromCharCode(bytes[j]);
  }
  const byteString = chars.join('');
  const base64bytes = btoa(byteString);

  try {
    postMessage(base64bytes, [base64bytes]);
  } catch (e) {
    postMessage(base64bytes);
  }
};

Am I making some big no-nos here? Are there any ways to reduce the memory consumption? One solution I've thought about would be to process the contents in chunks rather than the whole file, then concatenate the resulting strings and encode it on the outside. Would that be viable, or will that cause problems of its own? Are there any other magical functions I don't know about? I had a glimmer of hope with FileReader.readAsBinaryString(), but it's now removed from the standard (and not supported in IE10 anyway) so I can't use it.

(I realize this question could be relevant at Code Review too, but since my code is actually crashing, I figured SO was the correct place)


Solution

  • One solution I've thought about would be to process the contents in chunks rather than the whole file, then concatenate the resulting strings and encode it on the outside. Would that be viable, or will that cause problems of its own?

    This is what https://github.com/beatgammit/base64-js seems to do, doing ~16k at a time. Using this, not using transferables (as IE 10 doesn't support them) on my computer, Chrome manages to encode a 190mb ArrayBuffer (larger than this it complains about invalid string length), and IE 11 40mb (larger than this I get an out of memory exception) .

    You can see this at https://plnkr.co/edit/SShi1PE4DuMATcyqTRPx?p=preview, where the worker has the code

    var exports = {};
    importScripts('b64.js')
    
    onmessage = function(e) {
      var base64Bytes = fromByteArray(new Uint8Array(e.data));
      postMessage(base64Bytes);
    };
    

    and the main thread

    var worker = new Worker('worker.js');
    var length = 1024 * 1024 * 40;
    worker.postMessage(new ArrayBuffer(length));
    
    worker.onmessage = function(e) {
      console.log('Received Base64 in UI thread', e.data.length, 'bytes');
    }
    

    To go beyond the 40mb limit, one way that seems promising, is to only pass a smaller slice to the worker at a time (say 1mb), encode it, return the result, and only then pass the next slice to the worker, concatenating all the results at the end. I've managed to use this to encode larger buffers (up to 250mb in IE 11). My suspicion is that the asynchronicity allows the garbage collector to run between invocations.

    For example at, https://plnkr.co/edit/un7TXeHwYu8eBltfYAII?p=preview, with the same code in the worker as above, but in the UI thread:

    var worker = new Worker('worker.js');
    var length = 1024 * 1024 * 60;
    var buffer = new ArrayBuffer(length);
    
    var maxMessageLength = 1024 * 1024;
    var i = 0;
    function next() {
      var end = Math.min(i + maxMessageLength, length);
      var copy = buffer.slice(i, end);
      worker.postMessage(copy);
      i = end;
    }
    
    var results = [];
    worker.onmessage = function(e) {
      results.push(e.data);
      if (i < length) {
        next();
      } else {
        results = results.join('');
        alert('done ' + results.length);
      }
    };
    
    next();