Search code examples
pythonpandastime-series

Identifying events in timeseries Pandas dataframe


I am trying to identify events in a timeseries Pandas dataframe. An event is when a value is non-zero for more than 30 seconds. An event can contain values that equal 0 as long as the values are not 0 for a consecutive 30 seconds or longer. If an event is shorter than 30 seconds and is surrounded by zeroes, it is not an event. An event ends at the last non-zero value where the following values are zero for 30 seconds or more. I want the output to look something like the Events column in the reprex.

Reprex:

import pandas as pd


Timestamp = pd.date_range("11-30-2023 23:54:00", periods = 63, freq = "5s")
Value=[0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0]
Events = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0] 
df = pd.DataFrame({"Timestamp":Timestamp, "Value":Value, "Events":Events})

Solution

  • First identify the chunks of ≥30s of 0s, mask them, then form groups in between and determine if the duration is ≥30s:

    # identify null values
    m = df['Value'].eq(0)
    # group consecutive
    group = (~m).cumsum()
    
    # compute the duration of the chunks of zeros
    zero_chunks = df.loc[m, 'Timestamp'].groupby(group).agg(np.ptp)
    
    # identify chunks ≥ 30s
    zero_chunks_gt_30s = zero_chunks[zero_chunks.ge('30s')].index
    
    # identify external zeros
    external_zeros = m.cummin() | m[::-1].cummin()
    
    # exclude 0s that are external or part of a chunks ≥ 30s
    excluded = group.isin(zero_chunks_gt_30s)&m|external_zeros
    
    # form groups in between, identify those ≥ 30s
    df['Events'] = (df.loc[~excluded, 'Timestamp']
                       .groupby(excluded.cumsum())
                       .transform(lambda x: np.ptp(x)>=pd.Timedelta('30s'))
                       .reindex(df.index, fill_value=0)
                       .astype(int)
                    )
    

    NB. the output is identical to the provided one.

    Output:

                 Timestamp  Value  Events
    0  2023-11-30 23:54:00    0.5       1
    1  2023-11-30 23:54:05    0.5       1
    2  2023-11-30 23:54:10    0.5       1
    3  2023-11-30 23:54:15    0.5       1
    4  2023-11-30 23:54:20    0.5       1
    5  2023-11-30 23:54:25    0.5       1
    6  2023-11-30 23:54:30    0.5       1
    7  2023-11-30 23:54:35    0.5       1
    8  2023-11-30 23:54:40    0.0       1
    9  2023-11-30 23:54:45    0.0       1
    10 2023-11-30 23:54:50    0.0       1
    11 2023-11-30 23:54:55    0.5       1
    12 2023-11-30 23:55:00    0.5       1
    13 2023-11-30 23:55:05    0.5       1
    14 2023-11-30 23:55:10    0.5       1
    15 2023-11-30 23:55:15    0.5       1
    16 2023-11-30 23:55:20    0.5       1
    17 2023-11-30 23:55:25    0.5       1
    18 2023-11-30 23:55:30    0.0       1
    19 2023-11-30 23:55:35    0.0       1
    20 2023-11-30 23:55:40    0.5       1
    21 2023-11-30 23:55:45    0.5       1
    22 2023-11-30 23:55:50    0.5       1
    23 2023-11-30 23:55:55    0.5       1
    24 2023-11-30 23:56:00    0.5       1
    25 2023-11-30 23:56:05    0.5       1
    26 2023-11-30 23:56:10    0.0       0
    27 2023-11-30 23:56:15    0.0       0
    28 2023-11-30 23:56:20    0.0       0
    29 2023-11-30 23:56:25    0.0       0
    30 2023-11-30 23:56:30    0.0       0
    31 2023-11-30 23:56:35    0.0       0
    32 2023-11-30 23:56:40    0.0       0
    33 2023-11-30 23:56:45    0.0       0
    34 2023-11-30 23:56:50    0.0       0
    35 2023-11-30 23:56:55    0.0       0
    36 2023-11-30 23:57:00    0.5       0
    37 2023-11-30 23:57:05    0.5       0
    38 2023-11-30 23:57:10    0.0       0
    39 2023-11-30 23:57:15    0.0       0
    40 2023-11-30 23:57:20    0.0       0
    41 2023-11-30 23:57:25    0.0       0
    42 2023-11-30 23:57:30    0.0       0
    43 2023-11-30 23:57:35    0.0       0
    44 2023-11-30 23:57:40    0.0       0
    45 2023-11-30 23:57:45    0.0       0
    46 2023-11-30 23:57:50    0.0       0
    47 2023-11-30 23:57:55    0.0       0
    48 2023-11-30 23:58:00    0.5       1
    49 2023-11-30 23:58:05    0.5       1
    50 2023-11-30 23:58:10    0.5       1
    51 2023-11-30 23:58:15    0.5       1
    52 2023-11-30 23:58:20    0.5       1
    53 2023-11-30 23:58:25    0.5       1
    54 2023-11-30 23:58:30    0.5       1
    55 2023-11-30 23:58:35    0.5       1
    56 2023-11-30 23:58:40    0.5       1
    57 2023-11-30 23:58:45    0.5       1
    58 2023-11-30 23:58:50    0.5       1
    59 2023-11-30 23:58:55    0.0       0
    60 2023-11-30 23:59:00    0.0       0
    61 2023-11-30 23:59:05    0.0       0
    62 2023-11-30 23:59:10    0.0       0