I have a pandas dataframe given below:
ID Year R1 R1_f
KAR1 20201001 1 5
KAR1 20201101 2 6
KAR1 20201201 3 7
KAR1 20210101 4 8
KAR1 20210201 5 9
KAR1 20210301 6 10
KAR1 20210401 7 11
KAR1 20210501 8 12
KAR1 20210601 9 13
KAR1 20210701 10 14
KAR1 20210801 11 15
KAR1 20210901 12 16
KAR2 20201001 4 9
KAR2 20201101 3 8
KAR2 20201201 2 7
KAR2 20210101 1 6
KAR2 20210201 9 5
KAR2 20210301 2 4
KAR2 20210401 6 3
KAR2 20210501 5 2
KAR2 20210601 3 1
KAR2 20210701 30 2
KAR2 20210801 34 3
KAR2 20210901 20 4
I need to transform above dataframe as given below:
ID Year R1_sum 3m_R1 6m_R1 9m_R1 12m_R1 R1_f 3m_R1_f 6m_R1_f 9m_R1_f 12m_R1_f
KAR1 20210901 12 33 57 72 78 16 45 81 108 126
KAR2 20210901 20 84 98 110 119 4 9 15 30 54
In above output dataframe:
R1_sum is having value equal to value in year 20210901 for both Id's.
3m_R1 is the summation of values of 3 months 20210901 to 20210701 for column R1
6m_R1 is the summation of values of 6 months from 20210901 to 20210401 for column R1
9m_R1 is the summation of values of 9 months from 20210901 to 20210101 for column R1
12m_R1 is the summation of values of 12 months from 20210901 to 20201001 for column R1
R1_f is having value equal to value in year 20210901 for both Id's.
3m_R1_f is the summation of values of 3 months 20210901 to 20210701 for column R1_f
6m_R1_f is the summation of values of 6 months from 20210901 to 20210401 for column R1_f
9m_R1_f is the summation of values of 9 months from 20210901 to 20210101 for column R1_f
12m_R1_f is the summation of values of 12 months from 20210901 to 20201001 for column R1_f
Please help
For count from last months to first months per groups ID
first crate helper groups by GroupBy.cumcount
and aggregate sum
, then aggregate sum
with GroupBy.cumsum
, reshape by DataFrame.unstack
, flatten MultiIndex in columns
and add to DataFrame created by GroupBy.last
:
Data:
df = pd.DataFrame({'ID': ['KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1',
'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR1', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2', 'KAR2'],
'Year': [20201001, 20201101, 20201201, 20210101, 20210201, 20210301,
20210401, 20210501, 20210601, 20210701, 20210801, 20210901,
20201001, 20201101, 20201201, 20210101, 20210201, 20210301,
20210401, 20210501, 20210601, 20210701, 20210801, 20210901],
'R1': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 4, 3, 2, 1, 9, 2, 6, 5, 3, 30, 34, 20],
'R1_f': [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 9, 8, 7, 6, 5, 4, 3, 2, 1, 2, 3, 4]})
print (df)
cols = ['R1','R1_f']
g = df.groupby('ID').cumcount(ascending=False) // 3
df1 = df.groupby(['ID',g])[cols].sum().groupby(level=0).cumsum().unstack()
print (df1)
R1 R1_f
0 1 2 3 0 1 2 3
ID
KAR1 33 57 72 78 45 81 108 126
KAR2 84 98 110 119 9 15 30 54
df2 = df.groupby('ID')[['Year'] + cols].last()
df2.columns = pd.MultiIndex.from_product([df2.columns, [-1]])
print (df2)
-1 -1 -1
ID
KAR1 20210901 12 16
KAR2 20210901 20 4
df = df2.join(df1).sort_index(axis=1)
df.columns = [f'{(b + 1) * 3}m_{a}' if b!=-1 else f'{a}_sum' for a, b in df.columns]
df = df.reset_index()
df.insert(1, 'Year', df.pop('Year_sum'))
print (df)
ID Year R1_sum 3m_R1 6m_R1 9m_R1 12m_R1 R1_f_sum 3m_R1_f \
0 KAR1 20210901 12 33 57 72 78 16 45
1 KAR2 20210901 20 84 98 110 119 4 9
6m_R1_f 9m_R1_f 12m_R1_f
0 81 108 126
1 15 30 54