Search code examples
javascriptnode.jshashhashcode

Hash of folders in nodejs


In my project, I want to calculate hash of folders. For example, there are 10 folders and these folders have many sub files. I know many way to get hash of files but is there any way for get hash of every folder?

My purpose in doing this is to understand if the files in the folder have changed.

I am open to suggestions and different ideas, I need your help. Thanks in advance.


Solution

  • It really depends upon how reliable you want your modification detection to be. The most reliable method would iterate through every file in every folder and calculate a hash of the actual file contents by reading every byte of every file.

    Other than that, you can examine file metadata such as filenames, modification date, file size. A change in any of those DOES indicate a change in the contents. But, the lack of a change in any of those does not conclusively indicate that the file contents has not changed. It is possible to modify the file contents, keep the same filename, keep the same file size and manually set the modification date back to what it was - thus fooling an examination of only the metadata.

    But, if you're willing to accept that it could be fooled via manipulation, but would normally detect changes, then you could iterate all the files of a folder and compute a combined hash that uses the metadata: the filenames, the file sizes and the file modification dates and come up with a single hash for the folder. Depending upon your purpose that may or may not be sufficient - you would have to make that call.

    Other than that, you're going to have to read every byte of every file and compute a hash of the actual file contents.

    Here's some demonstration code of the metadata hashing algorithm:

    const fsp = require("fs/promises");
    const { createHash } = require("crypto");
    const path = require('path');
    
    // -----------------------------------------------------
    // Returns a buffer with a computed hash of all file's metadata:
    //    full path, modification time and filesize
    // If you pass inputHash, it must be a Hash object from the crypto library
    //   and you must then call .digest() on it yourself when you're done
    // If you don't pass inputHash, then one will be created automatically
    //   and the digest will be returned to you in a Buffer object
    // -----------------------------------------------------
    
    async function computeMetaHash(folder, inputHash = null) {
        const hash = inputHash ? inputHash : createHash('sha256');
        const info = await fsp.readdir(folder, { withFileTypes: true });
        // construct a string from the modification date, the filename and the filesize
        for (let item of info) {
            const fullPath = path.join(folder, item.name);
            if (item.isFile()) {
                const statInfo = await fsp.stat(fullPath);
                // compute hash string name:size:mtime
                const fileInfo = `${fullPath}:${statInfo.size}:${statInfo.mtimeMs}`;
                hash.update(fileInfo);
            } else if (item.isDirectory()) {
                // recursively walk sub-folders
                await computeMetaHash(fullPath, hash);
            }
        }
        // if not being called recursively, get the digest and return it as the hash result
        if (!inputHash) {
            return hash.digest();
        }
    }
    
    computeMetaHash(__dirname).then(result => {
        console.log(result);
    }).catch(err => {
        console.log(err);
    });