Search code examples
cryptographysha1hash-collision

Does the length of the text to be hashed affect the probability of a collision?


I want to make unique code with sha1 with unique salt (definitely unique because i use id from table). I need unique code with 7 character but i can't use id from my table because it's more than 7 character. So i think it's better to use sha1 and take first 7 character of hashed text as my unique code. Does the length of the text to be hashed affect the probability of a collision?

for example :

textA = "myId:12345678"
textB = "myId:12345678, myId2:87654321"

Is it better to use long short text (e.g textA) or text (e.g textB)? Or does the text length have no effect?


Solution

  • Hashing your unique IDs is a bad idea. Don't do it.

    If your IDs consist of 7 hex digits, that gives you 268,435,456 unique values to play with. If you hash these values and truncate the results to 7 hex digits, then the probability of a collision rises very quickly (1% probability after 2,322 inputs, 50% after 19,290 inputs, 99% after 35,159 inputs).

    If your aim is to disguise these ID values so that an adversary is unable to figure out what the actual sequential values are, then use format-preserving encryption instead.

    Edit: If you need something secure, you could try implementing a Feistel network using an encryption function like AES as the round function. (AES is overkill, perhaps, but this could be implemented in a few lines of code in most languages based on existing primitives.) If you just want to obfuscate the IDs, you could use something simpler like this:

    def obfuscate_id(id, key):
        # Transforms id using a 28-bit key
        a = 81883721   # (random prime congruent to 1 mod 4)
        c = 2791751    # (any odd number will do)
        m = 2**28      # (modulus for 7-digit hex values)
        return ((id ^ key) * a + c) % m