Search code examples
algorithmhashcrc16

How to avoid crc16 collisions?


I am experiencing a pretty annoying issue, using a crc16 hash to manage some of my informations.

In my application, I pass some information into an url parameter, a huge encoded context. That context allow the users to recover their old searches. In that context, i have some elements I hash to be sure it won't take too much characters.

It seems that some elements return the same hash (crc16 algorithm).

I take the has and transform it to a string : crc.ToString("X4"); For example, two different elements gives me : 5A8E.

I tried to use a crc32, but if I do that, the old context won't be recognize.

Do you have any idea how i can find a solution to that ? Thanks a lot


Solution

  • Even if CRC16 was an ideal hash function (which it's not), with just 16 bits, the Birthday Paradox means that there's around a 50% chance of a hash collision in a set of just 2^8 = 256 items. You almost certainly need more bits.

    You can't keep the old hashes working and make them distinguish existing collisions -- that's a contradiction. But you can implement a new, better hashing scheme, add a flag to the URL parameters to indicate that you're using this new scheme, make sure that all your pages generate only these new-style URLs, and "grandfather in" the old-style URLs (which will continue to produce the same collisions as before). I'd suggest giving users a big, bright message to update their bookmarks, and auto-redirecting the page, whenever you get an old-style URL.