Search code examples
c++hashperfect-hash

Perfect hash function generator for functions


I have a set of C++ functions. I want to map this functions in an hash table, something like: unordered_map<function<ReturnType (Args...)> , SomethingElse>, where SomethingElse is not relevant for this question.

This set of functions is previously known, small (let say less than 50) and static (is not gonna change).

Since lookup performance is crucial (should be performed in O(1)), I want to define a perfect hashing function.

There exists a perfect hash function generator for this scenario?

I know that there exists perfect hashing functions generators (like GPERF or CMPH) but since I've never used them, I don't know if they're suitable for my case.

REASON:

I'm trying to design a framework where, given a program written in C++, the user can select a subset F of the functions defined in this program.

For each f belonging to F, the framework implements a memoization strategy: when we call f with input i, we store (i,o) inside some data structure. So, if we are going to call AGAIN f with i, we will return o without performing again the (time expensive) computation.

The "already computed results" will be shared among different users (maybe on the cloud), so if user u1 has already computed o, user u2 will save computing time calling f with i (using the same annotation of before).

Obviously, we need to store the set of pairs (f,inputs_sets) (where inputs_sets is the already computed results set that I talked before), which is the original question: how do I do it?

So, using the "enumeration trick" proposed in the comments in this scenario could be a solution, assuming that the all the users use the exactly same enumeration, which could be a problem: supposing that our program has f1,f2,f3 what if u1 wants to memoize only f1 and f2 (so F={f1,f2}), while u2 wants to memoize only f3 (so F={f3})? An overkill solution could be to enumerate all the functions defined in the program, but this could generate an huge waste of memory.


Solution

  • Ok, maybe not what you want to hear but consider this: since you talk about a few functions, less than 50, the hash lookup should be negligible, even with collisions. Have you actually profiled and saw that the lookup is critical?

    So my advise is to focus your energy on something else, most likely a perfect hash function would not bring any kind of improved performance in your case.

    I am going to go one step further and say that I think that for less than 50 elements a flat map (good ol' vector) would have similar performance (or maybe even better due to cache locality). But again, measurements are required.