I am looking for a fast hash with low collisions implemented in JavaScript. It doesn't need to be a crypto hash. I am basically using it as a way to see if a given file has already been uploaded (or partially uploaded) to a user's account to save them some upload time on large (video) files.
I am using the new HTML5 File API to read in slices of the file. I then hand this off to SparkMD5 to give me a hash of the file. I like the fact that SparkMD5 allows me to do an incremental hash so I don't have to read the entire thing in memory.
Overall, SparkMD5 works for my needs but for large files it can take a while to get me my hash (about 30 seconds for a 300MB file). I would ideally like to reduce this. I am not that knowledgeable on hash functions so I am not looking to port something and am ideally looking for an already implemented library.
Update (August 2021): My benchmark predates WebAssembly and is therefore out-of-date. There likely exist hash functions compiled into WebAssembly that beat the pure JS implementations. If somebody wants to update this benchmark, a pull request to my benchmark repo would be most welcome!
I benchmarked various hashing algorithms, and here's the fastest options I found:
If you only need 32 bit digests, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.
Use SparkMD5 if you need more than 32 bits. I didn't find a fast 64 or 128 bit Murmur implementation, but SparkMD5 was surprisingly fast (75 MB/sec).
These recommendations are for pure JavaScript. I benchmarked them with strings, but SparkMD5 takes ArrayBuffers as well.
If you want fast hashing modules on Node, your best options are slightly different:
If you are hashing Buffers: Use the built-in crypto module with the MD5 algorithm.
The exception to this is: If you don't need incremental hashing, and you need more than 500 MB/sec throughput, and you're OK with a native npm dependency, use murmurhash-native for some extra performance. I didn't test digest sizes of less than 128 bit, as even with 128 bits the hashing is so fast that it's not likely to be a bottleneck.
(Note that murmurhash-native technically supports incremental hashing, but it's slower than Node's built-in MD5 algorithm in this mode.)
If you are hashing a single string non-incrementally, convert it to a Buffer and see the preceding bullet point.
If you are hashing strings incrementally:
If you only need 32 bits, use iMurmurHash. Note that this will give you collisions after about 2**14 (16,000) hashes.
If you need more than 32 bits: Use the built-in crypto module with the MD5 algorithm.