Search code examples
pythonpandastimefrequency

Finding the frequency of restaurant order cancellations by time in five second intervals


My data set looks like the below:

order_id canceled_at accepted_at Delta
1 2021-07-02 23:30:26 2021-07-02 23:29:21 0 days 00:01:05
2 2021-09-17 20:35:44 2021-09-17 20:35:33 0 days 00:00:11
3 2021-05-21 11:47:28 2021-05-21 11:37:19 0 days 00:10:09
4 2021-06-18 7:34:06 2021-06-18 7:31:08 0 days 00:02:58
5 2021-05-16 15:54:34 2021-05-16 15:53:43 0 days 00:00:51

Delta is from the time the customer canceled the order to the time they put in their order. My goal is to find the frequency of restaurant order cancellations by time in five second intervals as well as mode, median, mean. However, I attempted the below:

df_new.resample('5S', on='Delta').count().head(5)

My results were not ideal:

Delta order_id canceled_at accepted_at Delta
0 days 00:00:01 656 656 656 656
0 days 00:00:06 1348 1348 1348 1348
0 days 00:00:11 3874 3874 3874 3874
0 days 00:00:16 3586 3586 3586 3586
0 days 00:00:21 2916 2916 2916 2916

I'm not sure what's happening and I was hoping maybe someone could help? Thanks so much in advance! I think the count is filling everything.

Ideally I would just want the 2 columns

  1. Delta to be grouped every 5 seconds (the minimum Delta is 1 second so that looks right)
  2. and the count of cancellations per 5 second intervals.
5 Second Intervals Count of Cancellations
1 25
6 30
11 6
...
...
1 minute and 1 second 32

Solution

  • (((df['Delta'].dt.seconds - 1) // 5) * 5 + 1).value_counts().reset_index()
    

    The maths converts each Delta into a group-label. The // takes the quotient of division of 5 so that 1, 2, 3, 4, 5 seconds (after subtracted by 1) all become group 0. We then revert it back into seconds by * 5 + 1. value_counts does exactly the counting. reset_index gives you back a dataframe.