Search code examples
md5theorysha1

Are standard hash functions like MD5 or SHA1 quaranteed to be unique for small input (4 bytes)?


Scenario:

I'm writing web service, that will act like identity provider for 3pty application. I have to send to this 3pty application some unique identifier of our user. In our database, unique user identifier is integer (4 bytes, 32 bites). Per our security rules I can't send those in plain form - so sending them out hashed (trough function like MD5 or SHA1) was my first idea.

Problem:

The result of MD5 is 16 bytes, result of SHA1 is 40 bytes, I know they can't be unique for larger input sets, but given the fact my input set is only 4 bytes long (smaller then hashed results) - are they guaranteed to be unique, or am I doomed to some poor-man hash function (like xoring the integer input with some number, shifting bites, adding predefined bites, etc.) ?


Solution

  • For what you're trying to achieve (preventing a 3rd party from determining your user identifier), a straight MD5 or SHA1 hash is insufficient. 32 bits = about 4 billion values, it would take less than 2 hours for the 3rd party to brute force every value (@1m hashes/sec). I'd really suggest using HMAC-SHA1 instead.

    As for collisions, this question has an extremely good answer on their likelihood. tl;dr For 32-bits of input, a collision is excessively small.

    If your user identifiers aren't random (they increment by 1 or there is a known algorithm for creating them), then there's no reason you can't generate every hash to make sure that no collision will occur.

    This will check the first 10,000,000 integers for a collision with HMAC-SHA1 (will take about 2 minutes to run):

    public static bool checkCollisionHmacSha1(byte[] key){
        HMACSHA1 mac = new HMACSHA1(key);
        HashSet<byte[]> values = new HashSet<byte[]>();
        bool collision = false;
        for(int i = 0; i < 10000000 && collision == false; i++){
            byte[] value = BitConverter.GetBytes(i);
            collision = !values.Add(mac.ComputeHash(value));
            if (collision)
                break;
        }
        return collision;
    }