Search code examples
pythonpandas

How to count the number of power-on events and their durations for diodes based on timestamps?


I'm working on a problem where I need to analyze the power-on events of diodes over time. For simplicity, let's say that my objects are diodes, and the "date" represents the date and time when a particular diode is turned on. I record these timestamps every second. For example, consider the following data:

Object_ID date
1 2024-07-17T12:00:00
1 2024-07-17T12:00:01
1 2024-07-17T15:00:00
2 2024-07-17T22:00:00
2 2024-07-17T23:00:00
3 2024-07-17T12:00:00
4 2024-07-17T13:00:00

From the first three rows, we can infer that diode 1 was powered on at 12:00:00, stayed on for another second, then was turned off and back on 3 hours later for one second (or less).

What I need to determine is:

  • How many times each diode was powered on. (For example, diodes 1 and 2 were powered on 2 times, while diodes 3 and 4 were powered on just once.)
  • The duration of each power-on event for every diode. (For example, for diode 1, it was on for 2 seconds during the first event and 1 second during the second event.)

Solution

  • You can use a diff and some filtering to only keep the diffs that are below the step:

    # ensure datetime
    df['date'] = pd.to_datetime(df['date'])
    
    def duration(s, step='1s'):
        diff = s.diff()    # diff of successive datetime
        m = diff.le(step)  # keep those within step
        # sum deltas, add 1s for each start
        return diff[m].sum()+pd.Timedelta(step)*(~m).sum()
    
    out = (df.groupby('Object_ID', as_index=False)
             .agg(**{'n_events': ('date', lambda x: (~x.diff().le('1s')).sum()),
                     'duration': ('date', duration)
                    })
          )
    

    Output:

       Object_ID  n_events        duration
    0          1         2 0 days 00:00:03
    1          2         2 0 days 00:00:02
    2          3         1 0 days 00:00:01
    3          4         1 0 days 00:00:01
    

    aggregating events separately

    You can use a similar logic, first form the groups based on the diff greater than the step (or another threshold if you prefer). Then group again and aggregate the duration:

    df['date'] = pd.to_datetime(df['date'])
    
    step = pd.Timedelta('1s')
    
    g = df.groupby('Object_ID')['date']
    
    out = (df.assign(duration=g.diff(),
                     event=g.transform(lambda x: x.diff().gt(step).cumsum()))
             .groupby(['Object_ID', 'event'], as_index=False)
             ['duration'].agg(lambda x: x.iloc[1:].sum()+step)
          )
    

    Output:

       Object_ID  event        duration
    0          1      0 0 days 00:00:02
    1          1      1 0 days 00:00:01
    2          2      0 0 days 00:00:01
    3          2      1 0 days 00:00:01
    4          3      0 0 days 00:00:01
    5          4      0 0 days 00:00:01