How to count the number of power-on events and their durations for diodes based on timestamps?

I'm working on a problem where I need to analyze the power-on events of diodes over time. For simplicity, let's say that my objects are diodes, and the "date" represents the date and time when a particular diode is turned on. I record these timestamps every second. For example, consider the following data:

Object_ID	date
1	2024-07-17T12:00:00
1	2024-07-17T12:00:01
1	2024-07-17T15:00:00
2	2024-07-17T22:00:00
2	2024-07-17T23:00:00
3	2024-07-17T12:00:00
4	2024-07-17T13:00:00

From the first three rows, we can infer that diode 1 was powered on at 12:00:00, stayed on for another second, then was turned off and back on 3 hours later for one second (or less).

What I need to determine is:

How many times each diode was powered on. (For example, diodes 1 and 2 were powered on 2 times, while diodes 3 and 4 were powered on just once.)
The duration of each power-on event for every diode. (For example, for diode 1, it was on for 2 seconds during the first event and 1 second during the second event.)

Solution

You can use a diff and some filtering to only keep the diffs that are below the step:

# ensure datetime
df['date'] = pd.to_datetime(df['date'])

def duration(s, step='1s'):
    diff = s.diff()    # diff of successive datetime
    m = diff.le(step)  # keep those within step
    # sum deltas, add 1s for each start
    return diff[m].sum()+pd.Timedelta(step)*(~m).sum()

out = (df.groupby('Object_ID', as_index=False)
         .agg(**{'n_events': ('date', lambda x: (~x.diff().le('1s')).sum()),
                 'duration': ('date', duration)
                })
      )

Output:

   Object_ID  n_events        duration
0          1         2 0 days 00:00:03
1          2         2 0 days 00:00:02
2          3         1 0 days 00:00:01
3          4         1 0 days 00:00:01

aggregating events separately

You can use a similar logic, first form the groups based on the diff greater than the step (or another threshold if you prefer). Then group again and aggregate the duration:

df['date'] = pd.to_datetime(df['date'])

step = pd.Timedelta('1s')

g = df.groupby('Object_ID')['date']

out = (df.assign(duration=g.diff(),
                 event=g.transform(lambda x: x.diff().gt(step).cumsum()))
         .groupby(['Object_ID', 'event'], as_index=False)
         ['duration'].agg(lambda x: x.iloc[1:].sum()+step)
      )

Output:

   Object_ID  event        duration
0          1      0 0 days 00:00:02
1          1      1 0 days 00:00:01
2          2      0 0 days 00:00:01
3          2      1 0 days 00:00:01
4          3      0 0 days 00:00:01
5          4      0 0 days 00:00:01