My data set looks like the below:
order_id | canceled_at | accepted_at | Delta | |
---|---|---|---|---|
1 | 2021-07-02 23:30:26 | 2021-07-02 23:29:21 | 0 days 00:01:05 | |
2 | 2021-09-17 20:35:44 | 2021-09-17 20:35:33 | 0 days 00:00:11 | |
3 | 2021-05-21 11:47:28 | 2021-05-21 11:37:19 | 0 days 00:10:09 | |
4 | 2021-06-18 7:34:06 | 2021-06-18 7:31:08 | 0 days 00:02:58 | |
5 | 2021-05-16 15:54:34 | 2021-05-16 15:53:43 | 0 days 00:00:51 |
Delta is from the time the customer canceled the order to the time they put in their order. My goal is to find the frequency of restaurant order cancellations by time in five second intervals as well as mode, median, mean. However, I attempted the below:
df_new.resample('5S', on='Delta').count().head(5)
My results were not ideal:
Delta | order_id | canceled_at | accepted_at | Delta |
---|---|---|---|---|
0 days 00:00:01 | 656 | 656 | 656 | 656 |
0 days 00:00:06 | 1348 | 1348 | 1348 | 1348 |
0 days 00:00:11 | 3874 | 3874 | 3874 | 3874 |
0 days 00:00:16 | 3586 | 3586 | 3586 | 3586 |
0 days 00:00:21 | 2916 | 2916 | 2916 | 2916 |
I'm not sure what's happening and I was hoping maybe someone could help? Thanks so much in advance! I think the count is filling everything.
Ideally I would just want the 2 columns
5 Second Intervals | Count of Cancellations | |||
---|---|---|---|---|
1 | 25 | |||
6 | 30 | |||
11 | 6 | |||
... | ||||
... | ||||
1 minute and 1 second | 32 |
(((df['Delta'].dt.seconds - 1) // 5) * 5 + 1).value_counts().reset_index()
The maths converts each Delta
into a group-label. The //
takes the quotient of division of 5 so that 1
, 2
, 3
, 4
, 5
seconds (after subtracted by 1
) all become group 0
. We then revert it back into seconds by * 5 + 1
. value_counts
does exactly the counting. reset_index
gives you back a dataframe.