Search code examples
javascriptfirefox-addonmozilla

Read Raw Data in with Mozilla Add-on


I'm trying to read and write raw data from files using Mozilla's add-on SDK. Currently I'm reading data with something like:

function readnsIFile(fileName, callback){
    var nsiFile = new FileUtils.File(fileName);
    NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
        var data = NetUtil.readInputStreamToString(inputStream, inputStream.available(),{charset:"UTF-8"});
        callback(data, status, nsiFile);
    });
}

This works for text files, but when I start messing with raw bytes outside of Unicode's normal range, it doesn't work. For example, if a file contains the byte 0xff, then that byte and anything past that byte isn't read at all. Is there any way to read (and write) raw data using the SDK?


Solution

  • You've specified an explicit charset in the options to NetUtil.readInputStream.

    When you omit the charset option, then the data will be read as raw bytes. (Source)

    function readnsIFile(fileName, callback){
        var nsiFile = new FileUtils.File(fileName);
        NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
            // Do not specify a charset at all!
            var data = NetUtil.readInputStreamToString(inputStream, inputStream.available());
            callback(data, status, nsiFile);
        });
    }
    

    The suggestion to use io/byte-streams is OK as well, but keep in mind that that SDK module is still marked experimental, and that using ByteReader via io/file as the example suggests is not a good idea because this would be sync I/O on the main thread. I don't really see the upside, as you'd use NetUtil anyway.

    Anyway, this should work:

    const {ByteReader} = require("sdk/io/byte-streams");
    function readnsIFile(fileName, callback){
        var nsiFile = new FileUtils.File(fileName);
        NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
            var reader = new ByteReader(inputStream);
            var data = reader.read(inputStream);
            reader.close();
            callback(data, status, nsiFile);
        });
    }
    

    Also, please keep in mind that reading large files like this is problematic. Not only will the whole file buffered in memory, obviously, but:

    • The file is read as a char (byte) array first, so there will be a temporary buffer in the stream of at least file.size length (via asyncFetch).
    • Both NetUtil.readInputStreamToString and ByteReader will use another char (byte) array to read the result into from the inputStream, but ByteReader will do that in 32K chunks, while NetUtil.readInputStreamToString, will use a big buffer of file.length.
    • The data is then read into the resulting jschar/wchar_t (word) array aka. Javascript string, i.e. you need file.size * 2 bytes in memory at least.

    E.g., reading a 1MB file would require more than fileSize * 4 = 4MB memory (NetUtil.readInputStreamToString) and/or more than fileSize * 3 = 3MB memory (ByteReader) during the read operation. After the operation, 2MB of that memory will be still alive to store the resulting data in a Javascript string.

    Reading a 1MB file might be OK, but a 10MB file might be already problematic on mobile (Firefox for Android, Firefox OS) and a 100MB would be problematic even on desktop.

    You can also read the data directly into an ArrayBuffer (or Uint8Array), which has more efficient storage for byte arrays than a Javascript string and avoid the temporary buffers of NetUtil.readInputStreamToString and/or ByteReader.

    function readnsIFile(fileName, callback){
        var nsiFile = new FileUtils.File(fileName);
        NetUtil.asyncFetch(nsiFile, function (inputStream, status) {
            var bs = Cc["@mozilla.org/binaryinputstream;1"].
                createInstance(Ci.nsIBinaryInputStream);
            bs.setInputStream(inputStream);
            var len = inputStream.available();
            var data = new Uint8Array(len);
            reader.readArrayBuffer(len, data.buffer);
            bs.close();
            callback(data, status, nsiFile);
        });
    }
    

    PS: The MDN documentation might state something about "iso-8859-1" being the default if the charset option is omitted in the NetUtil.readInputStreamToString call, but the documentation is wrong. I'll fix it.