Search code examples
pythonpandastimestamp

Shortest intervals that contain X rows


I have dataframe like this

data = {
  "timeStamp": ['06:00:00', '06:03:00', '06:10:00', '06:30:00', '06:32:00', '06:02:00', '06:05:00', '06:06:00', '06:55:00', '06:00:00', '06:01:00', '06:20:00', '07:00:00'],
  "Event": ['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'D']
}

df = pd.DataFrame(data)

I need to know shortest intervals that contain 3 or more rows by each group.

In given example we see:

  • that during event "A" shortest interval with 3 rows is 10 minutes (from 06:00:00 to 06:10:00)
  • during event "B" shortest interval with 3 rows is 4 minutes (from 06:02:00 to 06:06:00)
  • during event "C" shortest interval with 3 rows is 20 minutes (from 06:00:00 to 06:20:00)
  • there are no such intervals during event "D"

Desired output looks like

Event Interval
A 00:10:00
B 00:04:00
C 00:20:00
D NA

...

etc

Is there any elegant way to do this?


Solution

  • You can groupby("Event") and then apply a custom aggregation function.

    # Convert to datetime to compute intervals
    df['timeStamp'] = pd.to_datetime(df['timeStamp'])
    
    def find_shortest_interval_3(group):
        if len(group) < 3:
            return None
        
        group = group.sort_values('timeStamp')
        min_interval = pd.Timedelta.max
        for i in range(len(group) - 2): # You can parameterize this so that its not always 3
            current_interval = group.iloc[i+2]['timeStamp'] - group.iloc[i]['timeStamp']
            if current_interval < min_interval:
                min_interval = current_interval
        return min_interval
    
    print(df.groupby('Event').apply(find_shortest_interval_3))
    

    Output:

    Event
    A   0 days 00:10:00
    B   0 days 00:04:00
    C   0 days 00:20:00
    D               NaT