Let's assume that we got a list like the one below:
list = [[A,10,3],[B,5,2],[C,8,1]]
For each item in the list there is a probability to be chosen which could be calculated by softmax. for example for first element (A) we have:
from math import exp
A_probability = exp(list[0][2]/list[0][1] /
(exp(list[0][2]/list[0][1]) +
exp(list[1][2]/list[1][1]) +
exp(list[2][2]/list[2][1])))
How can I choose the items in the list randomly according to the calculated probablity for each one?
I'm assuming that you have a pre-calculated list of probabilities (say probs
) for each of the indexes in the list (say data
) you want to choose from.
Further, probs
and data
obviously must have the same length and the entries of probs
must be non-negative numbers summing to 1
.
There is a neat but simple technique to randomly choose the indexes in data
according to the distribution in probs
that is known as the roulette wheel. In Python, I believe, it should look somehow like this
import random
data = ['A', 'B', 'C', 'D']
probs = [0.2, 0.4, 0.3, 0.1]
def roulette_wheel(probs):
rand = random.random()
for slot, prob in enumerate(probs):
rand -= prob
if rand < 0.0:
return slot
Note that this can be generalized to a list of non-negative weights (that doesn't have to add up to 1
) by multiplying rand
by the term sum(weights)
. I believe, I first saw this cute idea in a book about Pascal programming some eons ago.
Edit:
As MadPhysicist suggested in a comment this can be made a lot more efficient if one needs to draw repeatedly from the same data. In that case, one can pre-compute the cumulative distribution function and then just do a binary search for the index such that cumulative prob. <= rand ~ U(0, 1)
. In Python, this could for example look somehow like the following
from random import random
from bisect import bisect_right
def cdf(probs):
cdf = []
total = 0.0
for p in probs:
total += p
cdf.append(total)
return cdf
def roulette_wheel_bisect(cdf):
return bisect_right(cdf, random())
# compute cdf
cumsum = cdf(probs)
# randomly draw 10 indexes
for i in range(0, 10):
print(roulette_wheel_bisect(cumsum))
Disclaimer: I'm not a Python programmer by trade, so the code above should only illustrate the general idea. It may not be very robust for practical uses. You should always use a well-tested standard library, numpy for example, if you can.
Edit2:
I've just learned that numpy
has numpy.random.choice which does exactly what you need. Example:
from numpy import random
data = ['A', 'B', 'C', 'D']
probs = [0.2, 0.4, 0.3, 0.1]
# randomly draw 10 list elements with replacement
for i in range(0, 10):
print(random.choice(data, p=probs))