Search code examples
javascripthtmlgoogle-chromelarge-files

Progressively read binary file in JavaScript


Using Chrome, I am trying to read and process a large (>4GB) binary file on my local disk. It looks like the FileReader API will only read the entire file, but I need to be able to read the file progressively as a stream.

This file contains a sequence of frames containing a 1-byte type identifier, a 2-byte frame length, an 8-byte time stamp, and then some binary data that has a format based on the type. The content of these frames will be accumulated, and I'd like to use HTML5+JavaScript to generate graphs and display other metrics as real-time playback based on the content of this file.

Anybody have any ideas?


Solution

  • Actually, Files are Blobs, and Blob has a slice method, which we can use to grab smaller chunks of large files.

    I wrote the following snip last week to filter large log files, but it shows the pattern you can uses to loop sub-section-by-sub-section through big files.

    1. file is the file object
    2. fnLineFilter is a function that accepts one line of the file and returns true to keep it
    3. fnComplete is a callback where the collected lines are passed as an array

    here is the code i used:

     function fileFilter(file, fnLineFilter, fnComplete) {
         var bPos = 0,
             mx = file.size,
             BUFF_SIZE = 262144,
             i = 0,
             collection = [],
             lineCount = 0;
         var d1 = +new Date;
         var remainder = "";
    
         function grabNextChunk() {
    
             var myBlob = file.slice(BUFF_SIZE * i, (BUFF_SIZE * i) + BUFF_SIZE, file.type);
             i++;
    
             var fr = new FileReader();
    
             fr.onload = function(e) {
    
                 //run line filter:
                 var str = remainder + e.target.result,
                     o = str,
                     r = str.split(/\r?\n/);
                 remainder = r.slice(-1)[0];
                 r.pop();
                 lineCount += r.length;
    
                 var rez = r.map(fnLineFilter).filter(Boolean);
                 if (rez.length) {
                     [].push.apply(collection, rez);
                 } /* end if */
    
                 if ((BUFF_SIZE * i) > mx) {
                     fnComplete(collection);
                     console.log("filtered " + file.name + " in " + (+new Date() - d1) + "ms  ");
                 } /* end if((BUFF_SIZE * i) > mx) */
                 else {
                     setTimeout(grabNextChunk, 0);
                 }
    
             };
             fr.readAsText(myBlob, myBlob.type);
         } /* end grabNextChunk() */
    
         grabNextChunk();
     } /* end fileFilter() */
    

    obviously, you can get rid of the line finding, and just grab pure ranges instead; i wasn't sure what type of data you need to dig through and the important thing is the slice mechanics, which are well-demonstrated by the text-focused code above.