Search code examples
javascriptfilefirefox-addon-webextensionslz4

Reading an LZ4 compressed text file (mozlz4) in WebExtensions (JavaScript, Firefox)


I'm porting a Firefox Add-on SDK extension to WebExtensions. Previously I could access the browser's search engines, but now I can't, so a helpful user suggested I try reading the search.json.mozlz4 file, which has every installed engine. However, this file is json with LZ4 compression, and it's in Mozilla's own LZ4 format, with a custom magic number, 'mozLz40\0'.

Before, one could use this to read a text file that uses LZ4 compression, including a mozlz4 file:

let bytes = OS.File.read(path, { compression: "lz4" });
let content = new TextDecoder().decode(bytes);

(although I couldn't find documentation about the "compression" field, it works)

Now, using WebExtensions, the best I could come up with to read a file is

var reader = new FileReader();
reader.readAsText(file);
reader.onload = function(ev) {
    let content = ev.target.result;
};

This does not handle compression in any way. This library handles LZ4, but it is for node.js so I can't use that. [edit: it works standalone too]. However, even if I remove the custom magic number processing I can't get it to decompress the file, while this Python code, in comparison, works as expected:

import lz4
file_obj = open("search.json.mozlz4", "rb")
if file_obj.read(8) != b"mozLz40\0":
    raise InvalidHeader("Invalid magic number")
print(lz4.block.decompress(file_obj.read()))

How can I do this in JS?


Solution

  • After much trial and error, I was finally able to read and decode the search.json.mozlz4 file in a WebExtension. You can use the node-lz4 library, though you'll only need one function - uncompress (aliased as decodeBlock for external access) - so I renamed it to decodeLz4Block and included it here with slight changes:

    // This method's code was taken from node-lz4 by Pierre Curto. MIT license.
    // CHANGES: Added ; to all lines. Reformated one-liners. Removed n = eIdx. Fixed eIdx skipping end bytes if sIdx != 0.
    function decodeLz4Block(input, output, sIdx, eIdx)
    {
        sIdx = sIdx || 0;
        eIdx = eIdx || input.length;
    
        // Process each sequence in the incoming data
        for (var i = sIdx, j = 0; i < eIdx;)
        {
            var token = input[i++];
    
            // Literals
            var literals_length = (token >> 4);
            if (literals_length > 0) {
                // length of literals
                var l = literals_length + 240;
                while (l === 255) {
                    l = input[i++];
                    literals_length += l;
                }
    
                // Copy the literals
                var end = i + literals_length;
                while (i < end) {
                    output[j++] = input[i++];
                }
    
                // End of buffer?
                if (i === eIdx) {
                    return j;
                }
            }
    
            // Match copy
            // 2 bytes offset (little endian)
            var offset = input[i++] | (input[i++] << 8);
    
            // 0 is an invalid offset value
            if (offset === 0 || offset > j) {
                return -(i-2);
            }
    
            // length of match copy
            var match_length = (token & 0xf);
            var l = match_length + 240;
            while (l === 255) {
                l = input[i++];
                match_length += l;
            }
    
            // Copy the match
            var pos = j - offset; // position of the match copy in the current output
            var end = j + match_length + 4; // minmatch = 4
            while (j < end) {
                output[j++] = output[pos++];
            }
        }
    
        return j;
    }
    

    Then declare this function that receives a File object (not a path) and callbacks for success/error:

    function readMozlz4File(file, onRead, onError)
    {
        let reader = new FileReader();
    
        reader.onload = function() {
            let input = new Uint8Array(reader.result);
            let output;
            let uncompressedSize = input.length*3;  // size estimate for uncompressed data!
    
            // Decode whole file.
            do {
                output = new Uint8Array(uncompressedSize);
                uncompressedSize = decodeLz4Block(input, output, 8+4);  // skip 8 byte magic number + 4 byte data size field
                // if there's more data than our output estimate, create a bigger output array and retry (at most one retry)
            } while (uncompressedSize > output.length);
    
            output = output.slice(0, uncompressedSize); // remove excess bytes
    
            let decodedText = new TextDecoder().decode(output);
            onRead(decodedText);
        };
    
        if (onError) {
            reader.onerror = onError;
        }
    
        reader.readAsArrayBuffer(file); // read as bytes
    };
    

    Then you can add an HTML button to your add-on settings page that lets the user search and select search.json.mozlz4 (in WebExtensions you can't simply open any file in the filesystem without user intervention):

    <input name="selectMozlz4FileButton" type="file" accept=".json.mozlz4">
    

    To respond to the user selecting the file, use something like this, which calls the method we previously declared (here I don't use the error callback, but you can):

    let button = document.getElementsByName("selectMozlz4FileButton")[0];
    button.onchange = function onButtonPress(ev) {
        let file = ev.target.files[0];
        readMozlz4File(file, function(text){
            console.log(text);
        });
    };
    

    I hope this helps someone. I sure spent a lot of time working this simple thing out. :)