Search code examples
phpsha1pastebin

Sha1 substring question


I am making a pastebin type site and am trying to make the id be a random string like paste.com/4RT65L

I am getting the sha1 of the id before i add it to the database but i am getting the substring of the first 8 characters of the sha1. is their a possibility of their being a double copy of the same sha1? I dont want their to accidentaly be a second paste with an id that has already been used?


Solution

  • Well the odds of having a collision in the 8 characters is significantly higher than having a collision with two Sha1 keys, but that doesn't mean it is likely that it will happen.

    I would recommend that you do some testing on it. Generate random input and see how long it takes before you have a collision. If you like the results, then go with it. Otherwise, you'll need a longer string.

    EDIT: You can also calculate the odds of a collision by looking at the Birthday Paradox.

    Basically, if you are taking the first 8 hex digits from the SHA-1, then you have 16**8 (4,294,967,296) different available combinations.

    Using an online Birthay Paradox calculator, after about 9200 hashes, you will have a 1% chance of a collision. It will take about 30,000 hashes before you have a 10% chance, and 77,000 before you have a 50% chance.

    Its important to point out that as long as your hash function does a decent job of being pseudo-random, it doesn't matter which one you use (whether it is SHA1, MD5, or any form of Checksum)--these numbers assume completely random inputs, and thus you can only approach these values by using increasingly better hash functions.

    So in the end, it depends on how much traffic you are expecting. If this is a small site, you can probably get away with it. If it is a large amount of traffic, then your odds of a collision are very high.