I'm working on a problem where I need to analyze the power-on events of diodes over time. For simplicity, let's say that my objects are diodes, and the "date" represents the date and time when a particular diode is turned on. I record these timestamps every second. For example, consider the following data:
Object_ID | date |
---|---|
1 | 2024-07-17T12:00:00 |
1 | 2024-07-17T12:00:01 |
1 | 2024-07-17T15:00:00 |
2 | 2024-07-17T22:00:00 |
2 | 2024-07-17T23:00:00 |
3 | 2024-07-17T12:00:00 |
4 | 2024-07-17T13:00:00 |
From the first three rows, we can infer that diode 1 was powered on at 12:00:00, stayed on for another second, then was turned off and back on 3 hours later for one second (or less).
What I need to determine is:
You can use a diff
and some filtering to only keep the diffs that are below the step:
# ensure datetime
df['date'] = pd.to_datetime(df['date'])
def duration(s, step='1s'):
diff = s.diff() # diff of successive datetime
m = diff.le(step) # keep those within step
# sum deltas, add 1s for each start
return diff[m].sum()+pd.Timedelta(step)*(~m).sum()
out = (df.groupby('Object_ID', as_index=False)
.agg(**{'n_events': ('date', lambda x: (~x.diff().le('1s')).sum()),
'duration': ('date', duration)
})
)
Output:
Object_ID n_events duration
0 1 2 0 days 00:00:03
1 2 2 0 days 00:00:02
2 3 1 0 days 00:00:01
3 4 1 0 days 00:00:01
You can use a similar logic, first form the groups based on the diff greater than the step (or another threshold if you prefer). Then group again and aggregate the duration:
df['date'] = pd.to_datetime(df['date'])
step = pd.Timedelta('1s')
g = df.groupby('Object_ID')['date']
out = (df.assign(duration=g.diff(),
event=g.transform(lambda x: x.diff().gt(step).cumsum()))
.groupby(['Object_ID', 'event'], as_index=False)
['duration'].agg(lambda x: x.iloc[1:].sum()+step)
)
Output:
Object_ID event duration
0 1 0 0 days 00:00:02
1 1 1 0 days 00:00:01
2 2 0 0 days 00:00:01
3 2 1 0 days 00:00:01
4 3 0 0 days 00:00:01
5 4 0 0 days 00:00:01