Search code examples
node.jscryptographychecksum

Node.js & Crypto: Store the current state of a crypto.createHash Instance to reuse it later


How to store the current state of crypto.createHash('sha1') (after it got filled with hash.update(buffer)) to use it at another http request which might occur at a different process of node.js?

I imagine doing something like this:

var crypto = require('crypto'),
    hash   = someDatabase.read('hashstate')    // continue with filled hash
             || crypto.createHash('sha1');     // start a new hash

// update the hash
someObj.on('data', function(buffer){
    hash.update(buffer);
});
someObj.on('end', function(){
    // store the current state of hash to retrieve it later (this won't work:)
    someDatabase.write('hashstate', hash);

    if(theEndOfAllRequests){
        // create the result of multiple http requests
        hash.digest('hex');
    }
});

Solution

  • There are a couple of options I can come up with, with varying trade-offs. The big thing to note is that crypto doesn't expose partial state of its hash functions, so there's no way to directly implement your plan of saving state to a db.

    Option 1 involves diving into a hash function, which can be tricky. Fortunately, there already is one written in javascript. Again, it doesn't expose state, but I don't expect that would be a terribly difficult code transformation. I believe the entire state is stored in the variables defined at the top of create - h0-4, block, offset, shift, and totalLength. Then, you could save state in a db as you planned.

    Options 2 involves using crypto and passing data to be hashed between processes. This is a lot easier to work with, I think, but also a lot slower. In a few quick tests, it looks like messages will pass around at a rate of about 2.5-3MB/sec, so each 3MB chunk will take about 1.5 seconds (you can only pass strings, so I expect you'll need a Base64 conversion which costs an extra 33%). To do this, you would use process.send to send the data along with identifying id. The master process would use worker.on on each worker to get the messages, and keep a mapping of ids to hashing objects. Finally, you would want to have a flag in the message that tells the master it is receiving the last message, and it would worker.send the resulting hash (received in the worker with process.on).

    I'd be happy to elaborate on whichever of these sounds most suitable.