Search code examples
pythonrandomsetsampling

Creating random hash function that returns a set


Consider the following code:

def xorHash(n):
    mask = random.getrandbits(32)
    def xorIt(x):
        return (hash(x) ^ mask) % n
    return xorIt  

This returns a random hash function which maps elements into a number in {0,1,...,rng-1}.

I want to create a random hash function that maps each element into exactly k elements in {0,1,...,rng-1} (without repetitions). The example above does the job for k=1.

What is the most efficient way of creating a random hash function which returns a k-sized random subset of {0,1,...,rng-1}?


Solution

  • Seed an RNG with an ordinary integer-valued randomized hash of your data and use it to draw a random sample from the desired range:

    def generate_randomized_set_valued_hash_function(n, k):
        hashfunc = generate_randomized_hash_function()
        def set_valued_hashfunc(x):
            rng = random.Random(hashfunc(x))
            return set(rng.sample(xrange(n), k))
        return set_valued_hashfunc
    

    What RNG and what integer-valued hash function you choose will depend on how strong and how fast you need your set-valued hash function to be.