Sampling keys due to their values

I have a dictionary in python with key->value as str->int. If I have to chose a key based on it's own value, then as the value gets larger the key has a lower possibility of being chosen.

For example, if key1=2 and key2->1, then the attitude of key1 should be 2:1.

How can I do this?

Solution

1. Construct a CDF-like list like this:

def build_cdf(distrib):
    cdf = []
    val = 0
    for key, freq in distrib.items():
        val += freq
        cdf.append((val, key))
    return (val, cdf)

This function returns a tuple, the 1st value is the sum of probabilities, and 2nd value is the CDF.

2. Construct the sampler like this:

import random
def sample_from_cdf(val_and_cdf):
    (val, cdf) = val_and_cdf;
    rand = random.uniform(0, val)
    # use bisect.bisect_left to reduce search time from O(n) to O(log n).
    return [key for index, key in cdf if index > rand][0]

Usage:

x = build_cdf({"a":0.2, "b":0.3, "c":0.5});
y = [sample_from_cdf(x) for i in range(0,100000)];
print (len([t for t in y if t == "a"]))   # 19864
print (len([t for t in y if t == "b"]))   # 29760
print (len([t for t in y if t == "c"]))   # 50376

You may want to make this into a class.