Search code examples
pythonpandaswindow

How to calculate speed of transitions between two event categories in column in a pandas dataframe?


I have a pandas dataframe

ID api_timestamp event_level
0 A9 2022-02-28 13:20:19.305 NaN
1 A9 2022-02-28 13:20:21.703 L1_1
2 A9 2022-02-28 11:54:55.897 L1_2
3 A9 2022-02-28 12:18:00.882 L1_3
4 B1 2022-02-28 12:18:04.882 L1_1
5 B1 2022-02-28 12:18:05.882 L1_6
6 A9 2022-02-28 12:18:07.882 L2_1
7 A9 2022-02-28 12:18:08.882 L1_8

Here I have events at two levels L1 eg: L1_1,L1_2 and L2

How to calculate number of L2 events between L1 event pairs for each ID

ie Here for ID A9 , there is one L2_1 between L1_3 and L1_8


Solution

  • Idea is to sort the api_timestamp column and find the L2 event between L1 event by comparing the index of L2 event and the max and min index of L1 event.

    df['api_timestamp'] = pd.to_datetime(df['api_timestamp'])
    
    for name, g in df.dropna(subset='event_level').sort_values('api_timestamp', ignore_index=True).groupby('ID'):
        l1_indices = g[g['event_level'].str[:2].eq('L1')].index.tolist()
        l2_indices = g[g['event_level'].str.startswith('L2')].index.tolist()
        l2_between = [l2 for l2 in l2_indices if l2 > min(l1_indices) and l2 < max(l1_indices)]
        print(name, len(l2_between))
    
    A9 1
    B1 0