I am trying to identify events in a timeseries Pandas dataframe. An event is when a value is non-zero for more than 30 seconds. An event can contain values that equal 0 as long as the values are not 0 for a consecutive 30 seconds or longer. If an event is shorter than 30 seconds and is surrounded by zeroes, it is not an event. An event ends at the last non-zero value where the following values are zero for 30 seconds or more. I want the output to look something like the Events column in the reprex.
Reprex:
import pandas as pd
Timestamp = pd.date_range("11-30-2023 23:54:00", periods = 63, freq = "5s")
Value=[0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.0,0.0,0.0,0.0]
Events = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0]
df = pd.DataFrame({"Timestamp":Timestamp, "Value":Value, "Events":Events})
First identify the chunks of ≥30s of 0s, mask them, then form groups in between and determine if the duration is ≥30s:
# identify null values
m = df['Value'].eq(0)
# group consecutive
group = (~m).cumsum()
# compute the duration of the chunks of zeros
zero_chunks = df.loc[m, 'Timestamp'].groupby(group).agg(np.ptp)
# identify chunks ≥ 30s
zero_chunks_gt_30s = zero_chunks[zero_chunks.ge('30s')].index
# identify external zeros
external_zeros = m.cummin() | m[::-1].cummin()
# exclude 0s that are external or part of a chunks ≥ 30s
excluded = group.isin(zero_chunks_gt_30s)&m|external_zeros
# form groups in between, identify those ≥ 30s
df['Events'] = (df.loc[~excluded, 'Timestamp']
.groupby(excluded.cumsum())
.transform(lambda x: np.ptp(x)>=pd.Timedelta('30s'))
.reindex(df.index, fill_value=0)
.astype(int)
)
NB. the output is identical to the provided one.
Output:
Timestamp Value Events
0 2023-11-30 23:54:00 0.5 1
1 2023-11-30 23:54:05 0.5 1
2 2023-11-30 23:54:10 0.5 1
3 2023-11-30 23:54:15 0.5 1
4 2023-11-30 23:54:20 0.5 1
5 2023-11-30 23:54:25 0.5 1
6 2023-11-30 23:54:30 0.5 1
7 2023-11-30 23:54:35 0.5 1
8 2023-11-30 23:54:40 0.0 1
9 2023-11-30 23:54:45 0.0 1
10 2023-11-30 23:54:50 0.0 1
11 2023-11-30 23:54:55 0.5 1
12 2023-11-30 23:55:00 0.5 1
13 2023-11-30 23:55:05 0.5 1
14 2023-11-30 23:55:10 0.5 1
15 2023-11-30 23:55:15 0.5 1
16 2023-11-30 23:55:20 0.5 1
17 2023-11-30 23:55:25 0.5 1
18 2023-11-30 23:55:30 0.0 1
19 2023-11-30 23:55:35 0.0 1
20 2023-11-30 23:55:40 0.5 1
21 2023-11-30 23:55:45 0.5 1
22 2023-11-30 23:55:50 0.5 1
23 2023-11-30 23:55:55 0.5 1
24 2023-11-30 23:56:00 0.5 1
25 2023-11-30 23:56:05 0.5 1
26 2023-11-30 23:56:10 0.0 0
27 2023-11-30 23:56:15 0.0 0
28 2023-11-30 23:56:20 0.0 0
29 2023-11-30 23:56:25 0.0 0
30 2023-11-30 23:56:30 0.0 0
31 2023-11-30 23:56:35 0.0 0
32 2023-11-30 23:56:40 0.0 0
33 2023-11-30 23:56:45 0.0 0
34 2023-11-30 23:56:50 0.0 0
35 2023-11-30 23:56:55 0.0 0
36 2023-11-30 23:57:00 0.5 0
37 2023-11-30 23:57:05 0.5 0
38 2023-11-30 23:57:10 0.0 0
39 2023-11-30 23:57:15 0.0 0
40 2023-11-30 23:57:20 0.0 0
41 2023-11-30 23:57:25 0.0 0
42 2023-11-30 23:57:30 0.0 0
43 2023-11-30 23:57:35 0.0 0
44 2023-11-30 23:57:40 0.0 0
45 2023-11-30 23:57:45 0.0 0
46 2023-11-30 23:57:50 0.0 0
47 2023-11-30 23:57:55 0.0 0
48 2023-11-30 23:58:00 0.5 1
49 2023-11-30 23:58:05 0.5 1
50 2023-11-30 23:58:10 0.5 1
51 2023-11-30 23:58:15 0.5 1
52 2023-11-30 23:58:20 0.5 1
53 2023-11-30 23:58:25 0.5 1
54 2023-11-30 23:58:30 0.5 1
55 2023-11-30 23:58:35 0.5 1
56 2023-11-30 23:58:40 0.5 1
57 2023-11-30 23:58:45 0.5 1
58 2023-11-30 23:58:50 0.5 1
59 2023-11-30 23:58:55 0.0 0
60 2023-11-30 23:59:00 0.0 0
61 2023-11-30 23:59:05 0.0 0
62 2023-11-30 23:59:10 0.0 0