Search code examples
pythonpandasprobability

Transforming a dataframe of probabilities for specific periods to be probabilities of at least once over n periods?


I've got a dataframe which has probabilities for different events over a large number of sequential periods, and I want to transform this df to show the probability of something happening at least once over n periods. eg, I've got this, which would be n = 1:

event | period   | probability
A     | period 1 | 0.6
A     | period 2 | 0.7
A     | period 3 | 0.8
A     | period 4 | 0.85
A     | period 5 | 0.9

And I want to figure out the probability of A occurring at least once across two periods (n = 2), which would be:

A | period 1 | 1-(1-0.6)*(1-0.7)
A | period 2 | 1-(1-0.7)*(1-0.8)
A | period 3 | 1-(1-0.8)*(1-0.85)
A | period 4 | 1-(1-0.85)*(1-0.9)

And n = 3 would be:

A | period 1 | 1-(1-0.6)*(1-0.7)*(1-0.8)
A | period 2 | 1-(1-0.7)*(1-0.8)*(1-0.85)
A | period 3 | 1-(1-0.8)*(1-0.85)*(1-0.9)

Is there some python / pandas function or term that'd work here?


Solution

  • You can use groupby with transform:

    n = 2
    
    df['new_probability'] = df.groupby('event')['probability'].transform(lambda x: x.rolling(n).agg(lambda y: 1-np.prod(1-y)).shift(-n+1))
    
    print(df)
    event   period  probability  new_probability
    A  period1         0.60            0.880
    A  period2         0.70            0.940
    A  period3         0.80            0.970
    A  period4         0.85            0.985
    A  period5         0.90              NaN
    

    For n=3:

    n = 3
    
    df['new_probability'] = df.groupby('event')['probability'].transform(lambda x: x.rolling(n).agg(lambda y: 1-np.prod(1-y)).shift(-n+1))
    
    print(df)
    event   period  probability  new_probability
    A  period1         0.60            0.976
    A  period2         0.70            0.991
    A  period3         0.80            0.997
    A  period4         0.85              NaN
    A  period5         0.90              NaN