Search code examples
pandaswindow

Pandas: number of occurrences in the last N hours for each row by row's criteria


Given this dataset:

ID timestamp           type
 1 2022-12-12 01:00:00 TypeA
 2 2022-12-12 05:00:00 TypeA
 3 2022-12-12 06:00:00 TypeA
 4 2022-12-12 07:00.00 TypeB
 5 2022-12-13 00:00:00 TypeA
 6 2022-12-13 02:00:00 TypeB
 7 2022-12-13 23:00:00 TypeA

I want, for each row, to count the number of rows of its same type in the last N hours. In the example, for N=24h, that would be:

ID timestamp           type  count
 1 2022-12-12 01:00:00 TypeA 0
 2 2022-12-12 05:00:00 TypeA 1
 3 2022-12-12 06:00:00 TypeA 2
 4 2022-12-12 07:00.00 TypeB 0
 5 2022-12-13 00:00:00 TypeA 3
 6 2022-12-13 02:00:00 TypeB 1
 7 2022-12-13 23:00:00 TypeA 1

Solution

  • I am not sure if this is what you want?

    In [96]: df.set_index("datetime").groupby("type").rolling("24h").count()
    Out[96]:
                                ID  timestamp
    type  datetime
    TypeA 2022-12-12 01:00:00  1.0        1.0
          2022-12-12 05:00:00  2.0        2.0
          2022-12-12 06:00:00  3.0        3.0
          2022-12-13 00:00:00  4.0        4.0
          2022-12-13 23:00:00  2.0        2.0
    TypeB 2022-12-12 07:00:00  1.0        1.0
          2022-12-13 02:00:00  2.0        2.0
    
    In [125]: df.set_index("datetime").groupby("type")['ID'].rolling("23h", min_periods=0, closed="left").agg({"count":"count"})
    Out[125]:
                               count
    type  datetime
    TypeA 2022-12-12 01:00:00    0.0
          2022-12-12 05:00:00    1.0
          2022-12-12 06:00:00    2.0
          2022-12-13 00:00:00    3.0
          2022-12-13 23:00:00    1.0
    TypeB 2022-12-12 07:00:00    0.0
          2022-12-13 02:00:00    1.0