Search code examples
python-3.xrandomprobability-density

How to choose an item in a list according to a specific probability?


Let's assume that we got a list like the one below:

list = [[A,10,3],[B,5,2],[C,8,1]]

For each item in the list there is a probability to be chosen which could be calculated by softmax. for example for first element (A) we have:

from math import exp

A_probability = exp(list[0][2]/list[0][1] /
                     (exp(list[0][2]/list[0][1]) +
                      exp(list[1][2]/list[1][1]) +
                      exp(list[2][2]/list[2][1])))

How can I choose the items in the list randomly according to the calculated probablity for each one?


Solution

  • I'm assuming that you have a pre-calculated list of probabilities (say probs) for each of the indexes in the list (say data) you want to choose from.

    Further, probs and data obviously must have the same length and the entries of probs must be non-negative numbers summing to 1.

    There is a neat but simple technique to randomly choose the indexes in data according to the distribution in probs that is known as the roulette wheel. In Python, I believe, it should look somehow like this

    import random
    
    data = ['A', 'B', 'C', 'D']
    
    probs = [0.2, 0.4, 0.3, 0.1]
    
    def roulette_wheel(probs):
        rand = random.random()
        for slot, prob in enumerate(probs):
            rand -= prob
            if rand < 0.0:
                return slot
    

    Note that this can be generalized to a list of non-negative weights (that doesn't have to add up to 1) by multiplying rand by the term sum(weights). I believe, I first saw this cute idea in a book about Pascal programming some eons ago.

    Edit:

    As MadPhysicist suggested in a comment this can be made a lot more efficient if one needs to draw repeatedly from the same data. In that case, one can pre-compute the cumulative distribution function and then just do a binary search for the index such that cumulative prob. <= rand ~ U(0, 1). In Python, this could for example look somehow like the following

    from random import random
    from bisect import bisect_right
    
    
    def cdf(probs):
        cdf = []
        total = 0.0
        for p in probs:
            total += p
            cdf.append(total)
        return cdf
    
    
    def roulette_wheel_bisect(cdf):
        return bisect_right(cdf, random())
    
    # compute cdf
    cumsum = cdf(probs)
    
    # randomly draw 10 indexes 
    for i in range(0, 10):
        print(roulette_wheel_bisect(cumsum))
    

    Disclaimer: I'm not a Python programmer by trade, so the code above should only illustrate the general idea. It may not be very robust for practical uses. You should always use a well-tested standard library, numpy for example, if you can.

    Edit2:

    I've just learned that numpy has numpy.random.choice which does exactly what you need. Example:

    from numpy import random
    
    data = ['A', 'B', 'C', 'D']
    probs = [0.2, 0.4, 0.3, 0.1]
    
    # randomly draw 10 list elements with replacement
    for i in range(0, 10):
        print(random.choice(data, p=probs))