Search code examples
performancelookupperfect-hash

Create perfect hash for millions of items - result just needs to be "exists or not"


Does anyone know of a good library (windows) that will allow me to create a static (not runtime) perfect hash for millions of items (probably about 10m)?

I essentially have millions of sets of strings and I want to know at a minimal O(1) if a string is in my set or not - that's it. I don't need it to actually look up the string - there's no value behind it (other than the existance).


Solution

  • Try:

    perfect and gperf produce tables in C code form, which should work fine on Windows. I don't know what CMPH's output is.

    CMPH has a comment saying:

    gperf is a bit different, since it was conceived to create very fast perfect hash functions for small sets of keys and CMPH Library was conceived to create minimal perfect hash functions for very large sets of keys.

    If that's correct, then with your million-key case, you should probably prefer CMPH to gperf. I don't know how they compare to Jenkins's perfect. It should be simple enough to try all three and benchmark them against each other.