Search code examples
pythonpandasprobability

Is there a python function for cumulative probability


I have a dataframe that looks like this:

Id   Day1   Day2   Day3 
1    0.35   0.32   0.29  
2    0.63   0.59   0.58
3    0.12   0.10   0.07

This table shows the probability of a certain event occurring on each day, for each record.

What I'm searching for is a python function that will give me the cumulative probability of the event occurring on any day. The output would look like this:

Id   Day1   Day2   Day3  Cum_Prob
1    0.35   0.32   0.29  0.686
2    0.63   0.59   0.58  0.983
3    0.12   0.10   0.07  0.263

The Cum_Prob values in the above sample table are correct i.e. they are the actual probability of the event occurring on any of the 3 days for each Id value.

I can write this function myself for a couple of days. In reality, I'm dealing with a lot more than 3 days, and I believe hand-writing this function for lots of days will be extremely tedious.

Is there a pre-existing function that can calculate probability from an input of individual probabilities? Or is there a quick way to write a udf for this over x number of days?


Solution

  • With a little math, this is just

    1 - (1-df).prod(1)
    # if your `Id` is not index:
    # 1 - df.filter(like='days)
    # 1 - df.set_index('Id')
    

    Output:

    Id
    1    0.686180
    2    0.936286
    3    0.263440
    dtype: float64