Search code examples
algorithmhashchecksumcrc32bijection

Is triple-CRC-32 a bad (or not) idea for generating a non-secure uniform distribution hash?


I have an input of 288 bits (comprising 4 × 32-bit identity function outputs and 10 × 16-bit integers). I need to hash this to 96 bits with as few collisions as possible. The goal could be stated as key compression with probabilistic collisions.

I'm aware that CRC is a bijective hash, thus ensuring 100% even distribution (as I understand it). In my view, I should be able to run 3 parallel CRC paths through the input, resulting in a 96-bit lossy hash (obviously not bijective) of optimum distribution.

However, I'm also aware that CRC is not used for such applications. An algorithm such as MetroHash would typically be used.

Could someone explain to me why CRC is a bad (or not) idea for this application?

Note: This is not intended for anything secure.


Solution

  • Sure, that can work, but there are probably better approaches.

    For it to work, you would need to use three different CRC-32's with three different polynomials. And even then, be careful that they don't have common factors (e.g. x+1), to make sure that there are no correlated bits between the three.

    Better would be an approach like used in xxhash, but extended to 96 bits. That would be faster in software.

    Why 96 bits? That seems like an unnecessarily long hash.