Search code examples
pythonpandassumrow

sum of specific rows pandas dataframe


I have a dataframe and would like to add sums of specific rows into this dataframe. So for example I have

df = pd.DataFrame({'prod':['a','a','a','b','b','b','c','c','c'], 'attribute':['x','y','z','x','y','z','x','y','z'],
                  'number1':[1,2,2,3,4,3,5,1,1], 'number2':[10,2,3,3,1,2,3,1,1], 'number3':[1,4,3,5,7,1,3,0,1]})

How can I add for each prod a, b and c the sum of number 1/2/3 of the attributes y and z as a new row? So it looks like this

    prod    attribute   number1 number2 number3
0   a       x           1       10      1
1   a       y           2       2       4
2   a       z           2       3       3
3   a       sum_yz      4       5       7
4   b       x           3       3       5
5   b       y           4       1       7
6   b       z           3       2       1
7   b       sum_yz      7       3       8
8   c       x           5       3       3
9   c       y           1       1       0
10  c       z           1       1       1
11  c       sum_yz      2       2       1

Solution

  • You could make a separate DataFrane and append it back to the original DataFrame, something like this (this code is untested):

    # Filter to the desired attributes
    sum_yz = df[df['attribute'].isin(['y', 'z'])]
    # Set the new 'attribute' value
    sum_yz['attribute'] = 'sum_yz'
    # Group by and sum
    sum_yz = sum_yz.groupby(['prod', 'attribute']).sum().reset_index()
    
    # Add it the end of the data frame
    df = pd.concat([df, sum_yz])