Search code examples
pythonpython-itertools

summing dict values up to a threshold - itertools.takewhile?


A problem that frequently occurs in stochastic simulation is calculating which of several events occurs. This is typically done by having a probability for each event. Then generating a random number and then iterating through the possible events until finding where the sum of their probabilities exceeds the random number.

Here is code that does what I want, but I'm after a better way to do it.

import random

def getEvent(eventProbabilities,probability):
    S = 0
    for key in eventProbabilities.keys():
        S += eventProbabilities[key]
        if S>= probability:   #key is the event to happen.
            break
    return key

x = {'event1' : 0.1 , 'event2' : 0.2 , 'event3' : 0.2 , 'event4' : 0.4, 'event5' : 0.1} #values sum to 1.
p = random.random()  #random number between 0 and 1.
event = getEvent(x,p)
print p, event

I feel like there's got to be a more compact way to define getEvent - likely using takewhile - but I can't find it.

I'm after maximum efficiency, because I think this is where my code is going to spend most of its time.

Is there a way to make this more efficient through itertools (or otherwise)?

Edit added below to comment on whether we care about the order of iteration.

You'll notice that the keys will be processed in some order. That order is independent of the random number p. Let's consider the case where it processes them in the order I listed them. If p<=0.1 then it will return event1 So with 10% chance it's event1. If 0.1<p<0.3 it'll return event2 (so 20% probability). Now lets consider a different order. Say it goes in reverse order. In this case if 0.9<p we'll get event1 (10%), while if 0.7<p<=0.9 we get event2 (again 20%). So the probabilities of each event are the same, regardless of the order.

I just want to select 1 event, and I want that event selected with the corresponding probability.


Solution

  • If I understand you correctly, you simply want to select an element from a list with a given probability. Why not use numpy?

    >>> x = {'event1': 0.1 , 'event2': 0.2 , 'event3': 0.2 , 'event4': 0.4, 
             'event5': 0.1}
    >>> events, p = zip(*x.items())
    >>> np.random.choice(events, p=p, replace=False)
    'event4'