I have the following dataframe df
:
A B C
0 21 Blue 100
1 33 Yellow 100
2 17 White 250
3 A2 Grey 40
4 65 Green 500
5 33 Red 80
6 17 Purple -50
7 A2 Orange 600
Column B is basically irrelevant information IRT the code itself but still needs to be included in the output. I have sorted the dataframe by column A and gotten around the issue that col A has contains both int and str:
df['A'] = df['A'].astype(str)
df_sorted = df.sort_values(by=['A'])
So now df_sorted
looks like this:
A B C
2 17 White 250
6 17 Purple -50
0 21 Blue 100
1 33 Yellow 100
5 33 Red 80
4 65 Green 500
3 A2 Grey 40
7 A2 Orange 600
My question is: How can I then make subtotals for each change in col A by summarizing col C similar to Excel's subtotal function? The final output of the dataframe should look like this:
A B C
2 17 White 250
6 17 Purple -50
Subtotal 200
0 21 Blue 100
Subtotal 100
1 33 Yellow 100
5 33 Red 80
Subtotal 180
4 65 Green 500
Subtotal 500
3 A2 Grey 40
7 A2 Orange 600
Subtotal 640
you can concat
your original df and the groupby subtotal.
df1 = pd.concat([df,
df.groupby(['A'],as_index=False)['C'].sum()]).sort_values('A')
df1.loc[df1['B'].isnull(), 'A'] = 'Subtotal'
print(df1.fillna(''))
A B C
2 17 White 250
6 17 Purple -50
0 Subtotal 200
0 21 Blue 100
1 Subtotal 100
1 33 Yellow 100
5 33 Red 80
2 Subtotal 180
4 65 Green 500
3 Subtotal 500
3 A2 Grey 40
7 A2 Orange 600
4 Subtotal 640