Search code examples
c++hashhashalgorithm

Fastest and LightWeight Hashing Algorithm for Large Files & 512 KB Chunks [C,Linux,MAC,Windows]


I'm working on a Project which involves computation of Hashes for Files. The Project is like a File Backup Service, So when a file gets uploaded from Client to Server, i need to check if that file is already available in the server. I generate a CRC-32 Hash for the file and then send the hash to server to check if it's already available.

If the file is not in server, i used to send the file as 512 KB Chunks[for Dedupe] and i have to calculate hash for this each 512 KB Chunk. The file sizes may be of few GB's sometimes and multiple clients will connect to the server. So i really need a Fast and LightWeight Hashing algorithm for files. Any ideas ..?

P.S : I have already noticed some Hashing Algorithm questions in StackOverflow, but the answer's not quite comparison of the Hashing Algorithms required exactly for this kind of Task. I bet this will be really useful for a bunch of People.


Solution

  • Actually, CRC32 does not have neither the best speed, neither the best distribution.

    This is to be expected : CRC32 is pretty old by today's standard, and created in an era when CPU were not 32/64 bits wide nor OoO-Ex, also distribution properties were less important than error detection. All these requirements have changed since.

    To evaluate the speed and distribution properties of hash algorithms, Austin Appleby created the excellent SMHasher package. A short summary of results is presented here. I would advise to select an algorithm with a Q.Score of 10 (perfect distribution).