Search code examples
node.jscryptographyscrypt

Why Are node scrypt Hashes the Same Given the Same Inputs?


I was trying to find a compare or verify function for node's built-in crypto module, specifically for scrypt, as most password-hashing modules I have used have such a function. Then, I discovered why this was an impossible task: All hashes generated with these algorithms using the same parameters generate the same string (technically buffer). This is the case for many of crypto's hashing functions, including its pbkdf2 implementation.

Why is this safe? Isn't the whole (modern) point of a password/message hashing function that you can't generate the same password/message again using the same input? This is how the various bcrypt modules work, as well as the original version of scrypt, from which the built-in version, the one I'm asking about, got derived.

For example:

let scryptHash1;
let scryptHash2;
let scryptHash3;

let pbkdfHash1;
let pbkdfHash2;
let pbkdfHash3;

const key1 = 'my secret key';
const key2 = 'my other secret key';

const salt = 'my salt';

crypto.scrypt(key1, salt, 16, hash => scryptHash1 = hash);
crypto.scrypt(key1, salt, 16, hash => scryptHash2 = hash);
crypto.scrypt(key2, salt, 16, hash => scryptHash3 = hash);

scryptHash1.toString() === scryptHash2.toString(); // true
scryptHash1.toString() === scryptHash3.toString(); // false

crypto.pbkdf2(key1, salt, 16, 16, 'sha256', hash => pbkdfHash1 = hash);
crypto.pbkdf2(key1, salt, 16, 16, 'sha256', hash => pbkdfHash2 = hash);
crypto.pbkdf2(key2, salt, 16, 16, 'sha256', hash => pbkdfHash3 = hash);

pbkdfHash1.toString() === pbkdfHash2.toString(); // true
pbkdfHash1.toString() === pbkdfHash3.toString(); // false

I originally asked this question on Cryptography, as I'm more concerned about the safety than anything else, as I want to move from bcrypt to scrypt. However, as multiple people pointed out, and as I feared, the question is more about API design. That being said, any accepted answer should include why this method is safe, or safe enough to switch over (granting that "safe enough" is never safe enough). I took security as my major, but I'm now a web dev, and security changes all the time, though the core concepts stay mostly the same.


Solution

  • You seem to have some fundamental misunderstanding about password hashing. First and foremost, just as any hash function a password hashing function is also a function in the mathematical sense. I.e. it is simply a mapping that assigns a fixed value from its range to every element of its input domain.

    What sets password hashes apart from regular hashes is two things: First, they are designed to be slow and/or use large amounts of memory when evaluated. (This is irrelevant for our discussion here.) And second they take a second input, the salt.

    For a password hashing function H you want that for any fixed password m and any two salts s≠ s' it not only holds that H(m,s)≠ H(m,s'), but also given both hash values and salts you should not be able to detect that they are hash values of the same m.

    What you seem confused about are different choices of API design. Specifically who gets to choose the salt. Every time a new password m is hashed (e.g. to be entered into a database), a fresh uniformly random salt s should be chosen and then the hash value h:=H(m,s) is computed and both h and s are stored in the database. Whenever someone claiming to be that same user submits a password m' to authenticate themselves, what happens is that (h,s) is retrieved and its checked whether h=H(m',s).

    Now the question is who chooses the salt. It appears that APIs you are familiar with do not trust the user to do so. So when you make a call to hash password m, the library will choose a salt s, compute h and output h'=(h,s) as a "hash value". To check whether a password m' is correct, you then submit h',m' and the library will extract the salt, recompute the hash and compare.

    The library you are now looking at expects the user to choose the salt. I.e., each time you create a new entry in a password database you have to choose a new salt, compute h=H(m,s) and store both (h,s). Since the library in this case does not attempt to "hide" anything from you, you need to take care of the comparison.