Search code examples
javascriptfirefox-addonmozillaxulrunner

Mozilla Javascript Performance NEW OS.File vs OLD nsIFile over 3000 files


i have a directory that contains small XML files (every file is 170~200 bytes), and i want to read all content of every file and merge them in a single XML file, displayed in a tree.

OLD

FileUtils.File + NetUtil.asyncFetch + NetUtil.readInputStreamToString

Time to read 3000 XML files 1112.3642930000005 msec

NEW

OS.File.DirectoryIterator + OS.File.read

Time to read 3000 XML files 5330.708094999999 msec

I noticed an enormous difference in the reading time per single file : OLD has a time of 0.08~0.12 msec NEW has a time 0.5~6.0 msec ( 6.0 it's not a typo i saw some time peaks, in comparison to the OLD)

I know that the OLD one is linked to C++ but at : https://developer.mozilla.org/en-US/docs/Mozilla/JavaScript_code_modules/OSFile.jsm

OS.File is a new API designed for efficient, off-main thread, manipulation of files by privileged JavaScript code.

I don't see the efficency of the NEW API. Is there something wrong in my code?

n.b : dbgPerf is a performance debug that collects time and a comment in an object array and performs all calculation when i call the end function at the end of all. it does not affect performance.

Code using nsIFile :

this._readDir2 = function (pathToTarget, callbackEndLoad) {

    var _content = '';
    dbgPerf.add("2 start read dir");

    var fuDir = new FileUtils.File(pathToTarget);
    var entries = fuDir.directoryEntries;
    var files = [];
    while (entries.hasMoreElements()) {

        var entry = entries.getNext();
        entry = entry.QueryInterface(OX.LIB.Ci.nsIFile);

        if (entry.isFile()) {

            var channel = NetUtil.newChannel(entry);
            files.push(channel);
            dbgPerf.add("ADD file" + entry.path);
        } else {
            dbgPerf.add("NOT a file" + entry.path);
        }
    }

    var totalFiles = files.length;
    var totalFetched = 0;

    for (var a = 0; a < files.length; a++) {

        var entry = files[a];

        dbgPerf.add("start asynch file " + entry.name);
        NetUtil.asyncFetch(entry, function (inputStream, status) {

            totalFetched++;

            if (!Components.isSuccessCode(status)) {
                dbgPerf.add('asyncFetch failed for reason ' + status);
                return;
            } else {

                _content += NetUtil.readInputStreamToString(inputStream, inputStream.available());
                dbgPerf.add("process end file " + entry.name);
            }

            if (totalFetched == files.length) {

                var parser = new DOMParser();

                _content = _content.replace(/<root>/g, '');
                _content = _content.replace(/<\/root>/g, '');
                _content = _content.replace(/<catalog>/g, '');
                _content = _content.replace(/<\/catalog>/g, '');
                _content = _content.replace(/<\?xml[\s\S]*?\?>/g, '');

                xmlDoc = parser.parseFromString('<?xml version="1.0" encoding="utf-8"?><root>' + _content + '</root>', "text/xml");
                //dbgPerf.add("2 fine parsing XML file " + arrFileData);

                var response = {};
                response.total = totalFiles;
                response.xml = xmlDoc;

                callbackEndLoad(response);
            }
        });
    }

    dbgPerf.add("2 AFTER REQUEST ALL FILE");
};

CODE USING OS.File :

this._readDir = function (pathToTarget, callbackEndLoad) {

    dbgPerf.add("1 inizio read dir");

    var xmlDoc;
    var arrFileData = '';

    var iterator = new OS.File.DirectoryIterator(pathToTarget);

    var files = [];
    iterator.forEach(function onEntry(entry) {
        if (!entry.isDir) {
            files.push(entry.path);
        }
    });

    var totalFetched = 0;

    files.forEach(function (fpath) {

        Task.spawn(function () {

            arrFileData += OS.File.read(fpath, {
                encoding: "utf-8"
            });

            totalFetched++;

            if (totalFetched == files.length) {

                var parser = new DOMParser();

                arrFileData = arrFileData.replace(/<root>/g, '');
                arrFileData = arrFileData.replace(/<\/root>/g, '');
                arrFileData = arrFileData.replace(/<catalog>/g, '');
                arrFileData = arrFileData.replace(/<\/catalog>/g, '');
                arrFileData = arrFileData.replace(/<\?xml[\s\S]*?\?>/g, '');

                xmlDoc = parser.parseFromString('<?xml version="1.0" encoding="utf-8"?><root>' + arrFileData + '</root>', "text/xml");
                dbgPerf.add("1 fine parsing XML file " + arrFileData);

                var response = {};
                response.xml = xmlDoc;

                callbackEndLoad(response);
            }
        });
    });
};

Solution

  • I'm the author of OS.File.

    We had some benchmarks of nsIFile vs. OS.File back in the days. If you were to rewrite either nsIFile to work in a background thread (which is not possible by design of XPConnect) or OS.File to work in the main thread (which we made impossible to avoid blocking the UX), in most cases that I recall, you would find that OS.File is faster.

    As mentioned, by design, OS.File is designed specifically to not perform any work in the main thread. That's because I/O tasks have unpredictable duration – in extreme and unpredictable cases, the simple act of closing a file can block the thread for several seconds, which is unacceptable in the main thread.

    A consequence of this is that what you are benchmarking is actually the following:

    1. Serialize the request and send it to the OS.File thread;
    2. Perform the actual I/O;
    3. Serialize the response and send it to the main thread;
    4. Wait until the next tick of the main thread (which is when the main thread actually receives the response);
    5. Deserialize the response;
    6. Trigger the then callback and wait until the next tick of the main thread (by definition of Promise).

    The I/O efficiency is in step (2), insofar as OS.File is often much smarter than nsIFile, so will perform less I/O than nsIFile. That's better for battery, better for being a good citizen and playing nice with other processes, and better by comparison to other I/O performed in the same thread. The responsiveness is due to the fact that we perform as little work as possible in the main thread. But if your code is executed in the main thread, the total throughput is often going to be much lower than nsIFile due to steps (1), (3), (4), (5), (6).

    I hope this answers your question.

    PS Your snippets are wrong. For one thing, they are inverted. Also, you forget a yield in the call to OS.File.read.