I have a dataframe like as below
df = pd.DataFrame(
{'stud_id' : [101, 101, 101, 101,
101, 102, 102, 102],
'sub_code' : ['CSE01', 'CSE01', 'CSE01',
'CSE01', 'CSE02', 'CSE02',
'CSE02', 'CSE02'],
'ques_date' : ['10/11/2022', '06/06/2022','09/04/2022', '27/03/2022',
'13/05/2010', '10/11/2021','11/1/2022', '27/02/2022'],
'revenue' : [77, 86, 55, 90,
65, 90, 80, 67]}
)
df['ques_date'] = pd.to_datetime(df['ques_date'])
I would like to do the below
a) Compute custom financial year based on our organization FY calendar. Meaning, Oct-Dec is Q1, Jan -Mar is Q2,Apr - Jun is Q3 and July to Sep is Q4.
b) Group by stud_id
c) Compute sum of revenue from previous two custom FYs (from a specific date = 20/12/2022). For example, if we are in the FY-2023, I would like to get the sum of revenue for a customer from FY-2022 and FY-2021 separately
So, I tried the below based on this post here
df['custom_qtr'] = pd.to_datetime(df['ques_date'], dayfirst=True).dt.to_period('Q-SEP')
date_1 = pd.to_datetime('20-12-2022') # CUT-OFF DATE
df['custom_year'] = df['custom_qtr'].astype(str).str.extract('(?P<year>\d+)')
df['date_based_qtr'] = date_1.to_period('Q-SEP')
df['custom_date_year'] = df['date_based_qtr'].astype(str).str.extract('(?P<year>\d+)')
df['custom_year'] = df['custom_year'].astype(int)
df['custom_date_year'] = df['custom_date_year'].astype(int)
df['diff'] = df['custom_date_year'].sub(df['custom_year'])
df = df[df['diff'].isin([1,2])]
out_df = df.pivot_table("revenue", index=['stud_id'],columns=['custom_year'],aggfunc=['sum']).add_prefix('rev_').reset_index().droplevel(0,axis=1)
But this results in an inconcistent output column like below
I expect my output to be like as below
updated output
It seems you just need to first fine the quarter year, filter to only include 2021 and 2022 rows, and then summarize & pivot:
(df.assign(
qyear = pd.to_datetime(df['ques_date'], dayfirst=True).dt.to_period('Q-SEP').dt.qyear
)[lambda x: x.qyear.isin([2021, 2022])]
.assign(qyear=lambda x: x.qyear.astype('category').cat.set_categories([2021, 2022]))
.groupby(['stud_id', 'qyear'])
.revenue.sum()
.unstack(level=1)
.add_prefix('rev_')
.reset_index(drop=False))
#qyear stud_id rev_2021 rev_2022
#0 101 0 231
#1 102 0 157
For the update:
df['qyear'] = pd.to_datetime(df['ques_date'], dayfirst=True).dt.to_period('Q-SEP').dt.qyear.astype('category').cat.set_categories([2021, 2022])
df.groupby(['stud_id', 'sub_code', 'qyear']).revenue.sum().unstack(level=1, fill_value=0).add_prefix('rev_').reset_index(drop=False)
sub_code stud_id qyear rev_CSE01 rev_CSE02
0 101 2021 0 0
1 101 2022 231 0
2 102 2021 0 0
3 102 2022 0 157