Search code examples
javascriptpythonnode.jshashlibobject-hash

Why the packages object-hash and crypto / hashlib return different values for sha1?


I have a javascript frontend which compares two object-hash sha1 hashes in order to determine if an input has changed (in which case a processing pipeline needs to be reran).

I started building a python interface to interact with the same backend which uses hashlib for the sha1 generation, but unfortunately the two functions return different hash values even though the inputs are the same.

I managed to produce the same hash values as hashlib using crypto, which means that the issue arises from object-hash.

hashlib

import json
import hashlib

data = {
    'key1': 'value1',
    'key2': 'value2',
    'key3': 'value3',
}; 

json_data = json.dumps(data, separators=(',', ':')).encode('utf-8')

hash = hashlib.sha1()
hash.update(json_data)
print(hash.hexdigest())

# outputs f692755b3c38bc6b0dc376d775db8b07d6d5f256

crypto

const crypto = require('crypto');

const data = {
    key1: 'value1',
    key2: 'value2',
    key3: 'value3',
};

const stringData = JSON.stringify(data)

const shasum = crypto.createHash('sha1')
shasum.update(stringData)
console.log(shasum.digest('hex'));

// (same as hashlib) outputs f692755b3c38bc6b0dc376d775db8b07d6d5f256

object-hash (Tested with and without stringifying with no success)

const data = {
    key1: 'value1',
    key2: 'value2',
    key3: 'value3',
};

const stringData = JSON.stringify(data)


const objectHash = require('object-hash');
console.log(objectHash.sha1(stringData));

// outputs b5b0a100d7852748fe2e35bf00eeb536ad2d17d1

I saw in object-hash docs that the package is using crypto so it doesn't make sense for the two outputs to be different.

How can I make object-hash and hashlib/crypto all produce the same sha1 value?


Solution

  • It turns out that object-hash prefixes the variable for hashing with its type. In the case of strings I needed to add string:{string_length}: to the hash stream.

    hash = hashlib.sha1()
    
    hash.update(f'string:{len(json_data)}:'.encode('utf-8')) # The line in question
    
    hash.update(json_data)
    res = hash.hexdigest()
    print(res)
    

    Having done that, the hashes produced by hashlib and crypto are the same as those of object-hash.

    Note: This is not documented and I had to look through the source code to find exactly how to prefix strings in particular. Other types have different prefixes.