Search code examples
pythonpandasdataframetimestampdelta

Python - Pandas, group by time intervals


Having the following DF:

group_id                timestamp
       A  2020-09-29 06:00:00 UTC
       A  2020-09-29 08:00:00 UTC
       A  2020-09-30 09:00:00 UTC
       B  2020-09-01 04:00:00 UTC
       B  2020-09-01 06:00:00 UTC

I would like to count the deltas between records using all groups, not counting deltas between groups. Result for the above example:

delta       count
    2           2
   25           1

Explanation: In group A the deltas are

06:00:00 -> 08:00:00 (2 hours)
08:00:00 -> 09:00:00 on the next day (25 hours)

And in group B:

04:00:00 -> 06:00:00 (2 hours)

How can I achieve this using Python Pandas?


Solution

  • Use DataFrameGroupBy.diff for differencies per groups, convert to seconds by Series.dt.total_seconds, divide by 3600 for hours and last count values by Series.value_counts with convert Series to 2 columns DataFrame:

    df1 = (df.groupby("group_id")['timestamp']
            .diff()
            .dt.total_seconds()
            .div(3600)
            .value_counts()
            .rename_axis('delta')
            .reset_index(name='count'))
    print (df1)
       delta  count
    0    2.0      2
    1   25.0      1