Search code examples
pythondictionarysampling

Sampling keys due to their values


I have a dictionary in python with key->value as str->int. If I have to chose a key based on it's own value, then as the value gets larger the key has a lower possibility of being chosen.

For example, if key1=2 and key2->1, then the attitude of key1 should be 2:1.

How can I do this?


Solution

  • 1. Construct a CDF-like list like this:

    def build_cdf(distrib):
        cdf = []
        val = 0
        for key, freq in distrib.items():
            val += freq
            cdf.append((val, key))
        return (val, cdf)
    

    This function returns a tuple, the 1st value is the sum of probabilities, and 2nd value is the CDF.

    2. Construct the sampler like this:

    import random
    def sample_from_cdf(val_and_cdf):
        (val, cdf) = val_and_cdf;
        rand = random.uniform(0, val)
        # use bisect.bisect_left to reduce search time from O(n) to O(log n).
        return [key for index, key in cdf if index > rand][0]
    

    Usage:

    x = build_cdf({"a":0.2, "b":0.3, "c":0.5});
    y = [sample_from_cdf(x) for i in range(0,100000)];
    print (len([t for t in y if t == "a"]))   # 19864
    print (len([t for t in y if t == "b"]))   # 29760
    print (len([t for t in y if t == "c"]))   # 50376
    

    You may want to make this into a class.