Search code examples
pythondistributionprobabilitycdf

Mapping elements of a list to Intervals to create a cumulative probability distribution


I have a list of elements, for example L = [A, B, C]. Each element has an associated score for example S = [5, 1, 4].

I want to select an element from L according to its score in S, simply by generating a sort of cumulative probability distribution, where each element L[i] corresponds to an interval in (0,1] proportional to S[i]. Then, a random number drawn in (0,1] maps to the selected element.

For the example given before, we can have the scores of S represented as probabilities by normalizing on 5+1+4 so we get SS = [0.5, 0.1, 0.4], and we map the elements of L to intervals, such that:

B is mapped to (0, 0.1]
C is mapped to (0.1, 0.1+0.4]
A is mapped to (0.1+0.4, 0.5+0.5]

Now if I generate a random number r in (0,1] (e.g. r = random.random()), it will map to the corresponding element. For example if r = 0.03 we know that the element is B. And for instance, if r = 0.73 we know that the element is A ...

Is there a simple way in python to do this mapping between an element and an interval ?

I know that I can use numpy.cumsum to generate the cumulative sum of SS, but how to map elements to intervals obtained from this cumulative sum ?


Solution

  • Assuming I understand you correctly, I would recommend a dict:

    def branchFinder(L, S, r):
        S, L = map(list, zip(*sorted(zip(S, L))))
        SS = [0.] + map(lambda x: x/10., S)
    
        probs = {}
        for i in range(len(L)):
            probs[L[i]] = (sum(SS[:i+1]), SS[i+1] + sum(SS[:i+1]))        
    
        for key, value in probs.iteritems():
            if value[0] < r <= value[1]:
                 return key