Consider the following code:
def xorHash(n):
mask = random.getrandbits(32)
def xorIt(x):
return (hash(x) ^ mask) % n
return xorIt
This returns a random hash function which maps elements into a number in {0,1,...,rng-1}.
I want to create a random hash function that maps each element into exactly k
elements in {0,1,...,rng-1} (without repetitions). The example above does the job for k=1.
What is the most efficient way of creating a random hash function which returns a k-sized random subset of {0,1,...,rng-1}?
Seed an RNG with an ordinary integer-valued randomized hash of your data and use it to draw a random sample from the desired range:
def generate_randomized_set_valued_hash_function(n, k):
hashfunc = generate_randomized_hash_function()
def set_valued_hashfunc(x):
rng = random.Random(hashfunc(x))
return set(rng.sample(xrange(n), k))
return set_valued_hashfunc
What RNG and what integer-valued hash function you choose will depend on how strong and how fast you need your set-valued hash function to be.