Search code examples
algorithmrangelookup-tablesperfect-hash

Create a 'perfect hash function' for contiguous ranges


I'm looking for a way to 'project' a series of 'ranges' into a series of values. Applications would be histograms with uneven bins or creation of lookup tables.

An example:

0 to 14 => 0

15 to 234 => 1

235 => 2

236 to 255 => 3

The actual order of 'result' values (0, 1, 2, 3) on the right don't really matter as long as they are between 0 and 3, so that I can use a small lookup table after. It would be even better if I could make this work on floating point values (on the left).

I know I could use an 8-bit lookup table here and repeat values but I'd like to find a way to 'perfect hash' this: through a series of systematic operations (as small as possible, no branches) compute the right from the left values, to have the smallest possible result space.

I can't seem to find the correct series of magic Google incantations for this kind of algorithm.

The computing duration of this 'hash' or 'projection' function can be in the days if necessary.


Solution

  • If you have n ranges without overlap or gaps you can generate some simple code using O(log2(n)) instructions to to this lookup. I will demonstrate with some Python code.

    # Must be a power of two, extend with zeros on the right if needed.
    lookup = [0, 15, 235, 236]
    
    def index(x):
        i = 0
        # ceil(log2(len(lookup))) iterations in the following pattern.
        # We only need 2 iterations here.
        # ...
        # i += (x >= lookup[i+8]) << 4
        # i += (x >= lookup[i+4]) << 3
        i += (x >= lookup[i+2]) << 2
        i += (x >= lookup[i+1]) << 1
        return i
    

    This is known as a branchless binary search. E.g. even if you have 216 ranges, this would only require 32 additions and 16 table lookups, comparisons, and bitshifts to compute the exact index.