Search code examples
pythonpandasdataframeeventsprobability

Calculating probability of consecutive events with python pandas


Given a dataframe, how do I calculate the probability of consecutive events using python pandas?

For example,

Time A B C
1 1 1 1
2 -1 -1 -1
3 1 1 1
4 -1 -1 -1
5 1 1 1
6 -1 -1 -1
7 1 1 1
8 -1 1 1
9 1 -1 1
10 -1 1 -1

In this dataframe, B has two consecutive "1" in t=7 and t=8, and C has three consecutive "1" in t=7 to to=9.

Probability of event that two consecutive "1" appear is 3/27 Probability of event that three consecutive "1" appear is 1/24

How can I do this using python pandas?


Solution

  • Try this code(It can be used in other dataframes i.e. more columns, rows)

    def consecutive(num):
        '''
        df = pd.DataFrame({
            'Time' : [i for i in range(1, 11)],
            'A' : [1, -1, 1, -1, 1, -1, 1, -1, 1, -1],
            'B' : [1, -1, 1, -1, 1, -1, 1, 1, -1, 1],
            'C' : [1, -1, 1, -1, 1, -1, 1, 1, 1, -1]
            })
        print(df)
        '''
    
        row_num = df.shape[0]
        col_num = df.shape[1]
    
        cnt = 0   # the number of consecutives
        for col_index in range(1, col_num):    # counting for each column
            col_tmp = df.iloc[:, col_index]
            consec = 0
            for i in range(row_num):
    
                if col_tmp[i] == 1:
                    consec += 1
    
                # if -1 comes after 1, then consec = 0
                else:
                    consec = 0
    
                # to simply sum with the condition(consec == num), we minus 1 from consec
                if consec == num:
                    cnt += 1
                    consec -= 1
    
    
        all_cases = (row_num - num + 1) * (col_num - 1)   # col_num - 1 because of 'Time' column
    
        prob = cnt / all_cases
    
        return prob
    

    When you execute it with the given dataframe with this code

    print(f'two consectuvie : {consecutive(2)}')
    print(f'three consectuvie : {consecutive(3)}')
    

    Output :

       Time  A  B  C
    0     1  1  1  1
    1     2 -1 -1 -1
    2     3  1  1  1
    3     4 -1 -1 -1
    4     5  1  1  1
    5     6 -1 -1 -1
    6     7  1  1  1
    7     8 -1  1  1
    8     9  1 -1  1
    9    10 -1  1 -1
    two consectuvie : 0.1111111111111111
       Time  A  B  C
    0     1  1  1  1
    1     2 -1 -1 -1
    2     3  1  1  1
    3     4 -1 -1 -1
    4     5  1  1  1
    5     6 -1 -1 -1
    6     7  1  1  1
    7     8 -1  1  1
    8     9  1 -1  1
    9    10 -1  1 -1
    three consectuvie : 0.041666666666666664