I have a dataframe that looks like this:
Id Day1 Day2 Day3
1 0.35 0.32 0.29
2 0.63 0.59 0.58
3 0.12 0.10 0.07
This table shows the probability of a certain event occurring on each day, for each record.
What I'm searching for is a python function that will give me the cumulative probability of the event occurring on any day. The output would look like this:
Id Day1 Day2 Day3 Cum_Prob
1 0.35 0.32 0.29 0.686
2 0.63 0.59 0.58 0.983
3 0.12 0.10 0.07 0.263
The Cum_Prob
values in the above sample table are correct i.e. they are the actual probability of the event occurring on any of the 3 days for each Id
value.
I can write this function myself for a couple of days. In reality, I'm dealing with a lot more than 3 days, and I believe hand-writing this function for lots of days will be extremely tedious.
Is there a pre-existing function that can calculate probability from an input of individual probabilities? Or is there a quick way to write a udf for this over x number of days?
With a little math, this is just
1 - (1-df).prod(1)
# if your `Id` is not index:
# 1 - df.filter(like='days)
# 1 - df.set_index('Id')
Output:
Id
1 0.686180
2 0.936286
3 0.263440
dtype: float64