Given a dataframe, how do I calculate the probability of consecutive events using python pandas?
For example,
Time | A | B | C |
---|---|---|---|
1 | 1 | 1 | 1 |
2 | -1 | -1 | -1 |
3 | 1 | 1 | 1 |
4 | -1 | -1 | -1 |
5 | 1 | 1 | 1 |
6 | -1 | -1 | -1 |
7 | 1 | 1 | 1 |
8 | -1 | 1 | 1 |
9 | 1 | -1 | 1 |
10 | -1 | 1 | -1 |
In this dataframe, B has two consecutive "1" in t=7 and t=8, and C has three consecutive "1" in t=7 to to=9.
Probability of event that two consecutive "1" appear is 3/27 Probability of event that three consecutive "1" appear is 1/24
How can I do this using python pandas?
Try this code(It can be used in other dataframes i.e. more columns, rows)
def consecutive(num):
'''
df = pd.DataFrame({
'Time' : [i for i in range(1, 11)],
'A' : [1, -1, 1, -1, 1, -1, 1, -1, 1, -1],
'B' : [1, -1, 1, -1, 1, -1, 1, 1, -1, 1],
'C' : [1, -1, 1, -1, 1, -1, 1, 1, 1, -1]
})
print(df)
'''
row_num = df.shape[0]
col_num = df.shape[1]
cnt = 0 # the number of consecutives
for col_index in range(1, col_num): # counting for each column
col_tmp = df.iloc[:, col_index]
consec = 0
for i in range(row_num):
if col_tmp[i] == 1:
consec += 1
# if -1 comes after 1, then consec = 0
else:
consec = 0
# to simply sum with the condition(consec == num), we minus 1 from consec
if consec == num:
cnt += 1
consec -= 1
all_cases = (row_num - num + 1) * (col_num - 1) # col_num - 1 because of 'Time' column
prob = cnt / all_cases
return prob
When you execute it with the given dataframe with this code
print(f'two consectuvie : {consecutive(2)}')
print(f'three consectuvie : {consecutive(3)}')
Output :
Time A B C
0 1 1 1 1
1 2 -1 -1 -1
2 3 1 1 1
3 4 -1 -1 -1
4 5 1 1 1
5 6 -1 -1 -1
6 7 1 1 1
7 8 -1 1 1
8 9 1 -1 1
9 10 -1 1 -1
two consectuvie : 0.1111111111111111
Time A B C
0 1 1 1 1
1 2 -1 -1 -1
2 3 1 1 1
3 4 -1 -1 -1
4 5 1 1 1
5 6 -1 -1 -1
6 7 1 1 1
7 8 -1 1 1
8 9 1 -1 1
9 10 -1 1 -1
three consectuvie : 0.041666666666666664