Search code examples
pythonpandassubtotal

Pandas subtotal similar to Excel


I have the following dataframe df:

     A   B      C
0    21  Blue   100 
1    33  Yellow 100 
2    17  White  250 
3    A2  Grey   40
4    65  Green  500 
5    33  Red    80 
6    17  Purple -50
7    A2  Orange 600

Column B is basically irrelevant information IRT the code itself but still needs to be included in the output. I have sorted the dataframe by column A and gotten around the issue that col A has contains both int and str:

df['A'] = df['A'].astype(str)
df_sorted = df.sort_values(by=['A'])

So now df_sorted looks like this:

     A   B      C
2    17  White  250
6    17  Purple -50
0    21  Blue   100
1    33  Yellow 100
5    33  Red    80
4    65  Green  500
3    A2  Grey   40
7    A2  Orange 600

My question is: How can I then make subtotals for each change in col A by summarizing col C similar to Excel's subtotal function? The final output of the dataframe should look like this:

     A        B      C
2    17       White  250
6    17       Purple -50
     Subtotal        200  
0    21       Blue   100
     Subtotal        100
1    33       Yellow 100
5    33       Red    80
     Subtotal        180
4    65       Green  500
     Subtotal        500
3    A2       Grey   40
7    A2       Orange 600
     Subtotal        640

Solution

  • you can concat your original df and the groupby subtotal.

    df1 =  pd.concat([df,
                 df.groupby(['A'],as_index=False)['C'].sum()]).sort_values('A')
    
    df1.loc[df1['B'].isnull(), 'A'] = 'Subtotal'
    
    print(df1.fillna(''))
    
              A       B    C
    2        17   White  250
    6        17  Purple  -50
    0  Subtotal          200
    0        21    Blue  100
    1  Subtotal          100
    1        33  Yellow  100
    5        33     Red   80
    2  Subtotal          180
    4        65   Green  500
    3  Subtotal          500
    3        A2    Grey   40
    7        A2  Orange  600
    4  Subtotal          640