Search code examples
node.jsmmapmemory-mapped-filesmemory-mapping

How would I design and implement a non-blocking memory mapping module for node.js


There exists the mmap module for node.js: https://github.com/bnoordhuis/node-mmap/

As the author Ben Noordhuis notes, accesing mapped memory can block, which is why he does not recommend it anymore and discontinued it.

So I wonder how would I design a non-blocking memory mapping module for node.js? Threading, Fibers, ?

Obviously this nearby raises the question if threading in node.js would just happen elsewhere instead of the request handler.


Solution

  • When talking about implementing some native facility in a non-blocking fashion, the first place to look is libuv. It is how node's core modules interface with the underlying platform. Of particular interest is the work queue API.

    If we take a quick look at node-mmap's source, we see that it's actually extremely simple. It calls mmap and returns a node Buffer that wraps the mapped memory region.

    Reading from this Buffer is what results in the OS performing I/O. Because that will necessarily happen on the JS thread, we end up blocking the JS thread with disk I/O.

    Instead of returning a Buffer that allows JS direct access to the mapped memory, you should write a wrapper class in C++ that marshals reads and writes through the work queue. In this way, the disk I/O will happen on a separate thread.

    In JS, you'd use it something like this:

    fs.open('/path/to/file', 'r', function(err, fd) {
        fs.fstat(fd, function(err, stats) {
            var mapped = mmap.map(stats.size, mmap.PROT_READ, mmap.MAP_SHARED, fd, 0);
            mapped.read(start, len, function(err, data) {
                // ...
            });
        });
    });
    

    And in C, the read function would create a libuv work request and queue it in the work queue. The C worker function would then read the mapped memory range (based on the caller's specifications), which may cause disk I/O, but this is safe because it is happening on a separate thread.

    What happens next is interesting. The safe approach would be for the worker to alloc a new chunk of memory and memcpy from the mapped memory. The worker then passes a pointer to the copy, and the C callback wraps it up in a Buffer to be returned to JS-land.

    You could also try reading over the range (so that any necessary I/O happens on the worker thread) but not actually doing anything with the data, and then having the C callback simply wrap the mapped memory range in a Buffer. In theory, the parts of the file that the worker read would stay in RAM, so access to that portion of mapped memory would not block. However, I honestly don't know enough about mapped memory to say whether this might end up biting you.


    Finally, I'm dubious about whether this will actually provide any extra performance over node's regular fs methods. I would only go down this road if I was doing something that really justifies using mmap.