Search code examples
pythonrandomprobability

Is there a function in Python which gives an output based on a probability distribution?


I have an array of numbers, say from 0 to 8, which may appear repetedly in the array. I need to choose a number among those, and the probability of a number coming out should be proportional to the number of times it appears in that array.

This is the original array: ([7, 0, 7, 8, 4, 4, 6, 5, 2, 6, 0, 1, 2, 3, 4, 5, 6, 7, 8])

This is the array containing the number of times each number appears in the array:
array([ 2., 1., 3., 1., 1., 4., 1., 5., 1.])

This is the code in which I tried to get one index (of deg) in the way I described before

tot = sum((deg))
n = np.random.uniform(0, tot)
for i in range(len(deg)):
    if n < deg[i]:
        return i
    n = n - deg[i]
return i

I get an index, 2, but I don't know if the process is probabilitistically correct. What do you say?


Solution

  • You can simply use random.choice on the original input. The probability of choosing an element will automatically be proportional to the number of appearances since the selected index is uniformly distributed. No need to compute deg.

    As pointed out in the comments, you also have the option of using random.choices, which will not only allow you to collect multiple samples with replacement, but allows you to manually assign the proportions of each element.

    For example, the following three inputs should select the same three elements for a given seed:

     x = [1, 2, 3, 2, 7, 7, 7, 7]
     y = [1, 2, 3, 7]
     z = [1, 2, 1, 4]
     w = [0.125, 0.25, 0.125, 0.5]
    
     random.choices(x, k=3)
     random.choices(y, weights=z, k=3)
     random.choices(y, weights=w, k=3)
    

    To go from x to y and z, use collections.Counter:

     c = collections.Counter(x)
     y, z = map(list, zip(*c.items()))