I'm trying to get the frequency count of a groupby which is grouped by name and date. I am having trouble figuring out how to do a 3 days roll count prior to the current day.
example: on 2022-01-05, John's 3 days range are 2022-01-05 and 2022-01-01 with 2 times each, hence, showing 4 in the '3 days roll count' column. And '3 day completion count' column shows that John complete the work 1 time when working on 2022-01-05 and 2022-01-01. Kindly scroll down 'result trying to get' for df sample.
for clarification, 3 days range is a rolling range counting between 1 to 3, hence, the example shows 2 days. And the data starts on 2022-01-01, therefore, any dates before 2022-01-01 are reflects as 0. Lastly, on 2022-01-08, John's 3 days roll count equals to 4 is because the 3 days are 3rd, 5th, and 7th.
sample df
╔════╦════════════╦══════╦════════════╗
║ ║ date_id ║ name ║ completion ║
╠════╬════════════╬══════╬════════════╣
║ 0 ║ 2022-01-01 ║ John ║ Y ║
║ 1 ║ 2022-01-01 ║ John ║ N ║
║ 2 ║ 2022-01-02 ║ Sam ║ N ║
║ 3 ║ 2022-01-02 ║ Sam ║ N ║
║ 4 ║ 2022-01-03 ║ John ║ N ║
║ 5 ║ 2022-01-03 ║ John ║ N ║
║ 6 ║ 2022-01-03 ║ May ║ Y ║
║ 7 ║ 2022-01-04 ║ Sam ║ Y ║
║ 8 ║ 2022-01-04 ║ Sam ║ Y ║
║ 9 ║ 2022-01-05 ║ John ║ Y ║
║ 10 ║ 2022-01-05 ║ May ║ N ║
║ 11 ║ 2022-01-05 ║ May ║ Y ║
║ 12 ║ 2022-01-06 ║ Sam ║ N ║
║ 13 ║ 2022-01-07 ║ John ║ Y ║
║ 14 ║ 2022-01-08 ║ John ║ N ║
╚════╩════════════╩══════╩════════════╝
result trying to get
╔════╦════════════╦══════╦════════════╦═══════════════════╦════════════════════════╗
║ ║ date_id ║ name ║ completion ║ 3 days roll count ║ 3 day completion count ║
╠════╬════════════╬══════╬════════════╬═══════════════════╬════════════════════════╣
║ 0 ║ 2022-01-01 ║ John ║ Y ║ 0 ║ 0 ║
║ 1 ║ 2022-01-01 ║ John ║ N ║ 0 ║ 0 ║
║ 2 ║ 2022-01-02 ║ Sam ║ N ║ 0 ║ 0 ║
║ 3 ║ 2022-01-02 ║ Sam ║ N ║ 0 ║ 0 ║
║ 4 ║ 2022-01-03 ║ John ║ N ║ 2 ║ 1 ║
║ 5 ║ 2022-01-03 ║ John ║ N ║ 2 ║ 1 ║
║ 6 ║ 2022-01-03 ║ May ║ Y ║ 0 ║ 0 ║
║ 7 ║ 2022-01-04 ║ Sam ║ Y ║ 2 ║ 0 ║
║ 8 ║ 2022-01-04 ║ Sam ║ Y ║ 2 ║ 0 ║
║ 9 ║ 2022-01-05 ║ John ║ Y ║ 4 ║ 1 ║
║ 10 ║ 2022-01-05 ║ May ║ N ║ 1 ║ 1 ║
║ 11 ║ 2022-01-05 ║ May ║ Y ║ 1 ║ 1 ║
║ 12 ║ 2022-01-06 ║ Sam ║ N ║ 4 ║ 2 ║
║ 13 ║ 2022-01-07 ║ John ║ Y ║ 5 ║ 2 ║
║ 14 ║ 2022-01-08 ║ John ║ N ║ 4 ║ 2 ║
╚════╩════════════╩══════╩════════════╩═══════════════════╩════════════════════════╝
code for sample df
'date_id': [
'2022-01-01','2022-01-01','2022-01-02','2022-01-02','2022-01-03',
'2022-01-03','2022-01-03','2022-01-04','2022-01-04','2022-01-05',
'2022-01-05','2022-01-05','2022-01-06','2022-01-07','2022-01-08',
],
'name': [
'John','John','Sam','Sam','John',
'John','May','Sam','Sam','John',
'May','May','Sam','John','John',
],
'completion': [
'Y','N','N','N','N',
'N','Y','Y','Y','Y',
'N','Y','N','Y','N'
],
})
ps. i've tried using apply,lambda,shift with pd.to_timedelta found here to shift the dates but wasn't able to get the count right.
kindly advise
# is work completed?
df['is_complete'] = df['completion'] == 'Y'
# groupby name and date and aggregate to calculate how many
# times a work is attempted and how many times its actually completed
df1 = df.groupby(['name', 'date_id'])['is_complete'].agg(['count', 'sum'])
# Calculate a 4 days rolling sum per `name` and subtract the
# current value to get the effective 3 days rolling sum
df1 = df1.groupby(level=0).rolling(4, min_periods=1).sum().droplevel(1) - df1
df1.columns = ['3 days roll count', '3 day completion count']
# Merge the result with original dataframe
df1 = df.merge(df1.reset_index(), how='left')
date_id name completion is_complete 3 days roll count 3 day completion count
0 2022-01-01 John Y True 0.0 0.0
1 2022-01-01 John N False 0.0 0.0
2 2022-01-02 Sam N False 0.0 0.0
3 2022-01-02 Sam N False 0.0 0.0
4 2022-01-03 John N False 2.0 1.0
5 2022-01-03 John N False 2.0 1.0
6 2022-01-03 May Y True 0.0 0.0
7 2022-01-04 Sam Y True 2.0 0.0
8 2022-01-04 Sam Y True 2.0 0.0
9 2022-01-05 John Y True 4.0 1.0
10 2022-01-05 May N False 1.0 1.0
11 2022-01-05 May Y True 1.0 1.0
12 2022-01-06 Sam N False 4.0 2.0
13 2022-01-07 John Y True 5.0 2.0
14 2022-01-08 John N False 4.0 2.0