Search code examples
pythonpandasdataframeappendaggregate

How to add rows as sums of other rows in DataFrame?


I'm not sure I titled this post correctly but I have a unique situation where I want to append a new set of rows to an existing DataFrame as a sum of rows from existing sets and I'm not sure where to start.

For example, I have the following DataFrame:

import pandas as pd

data = {'Team': ['Atlanta', 'Atlanta', 'Cleveland', 'Cleveland'],
        'Position': ['Defense', 'Kicker', 'Defense', 'Kicker'],
        'Points': [5, 10, 15, 20]}

df = pd.DataFrame(data)

print(df)

        Team Position  Points
0    Atlanta  Defense       5
1    Atlanta   Kicker      10
2  Cleveland  Defense      15
3  Cleveland   Kicker      20

How do I create/append new rows which create a new position for each team and sum the points of the two existing positions for each team? Additionally, the full dataset consists of several more teams so I'm looking for a solution that will work for any number of teams.

edit: I forgot to include that there are other positions in the complete DataFrame; but, I only want this solution to be applied for the positions "Defense" and "Kicker".

My desired output is below.

            Team Position  Points
        Team          Position  Points
0    Atlanta           Defense       5
1    Atlanta            Kicker      10
2  Cleveland           Defense      15
3  Cleveland            Kicker      20
4    Atlanta  Defense + Kicker      15
5  Cleveland  Defense + Kicker      35

Thanks in advance!


Solution

  • We can use groupby agg to create the summary rows then append to the DataFrame:

    df = df.append(df.groupby('Team', as_index=False).agg({
        'Position': ' + '.join,  # Concat Strings together
        'Points': 'sum'  # Total Points
    }), ignore_index=True)
    

    df:

            Team          Position  Points
    0    Atlanta           Defense       5
    1    Atlanta            Kicker      10
    2  Cleveland           Defense      15
    3  Cleveland            Kicker      20
    4    Atlanta  Defense + Kicker      15
    5  Cleveland  Defense + Kicker      35
    

    We can also whitelist certain positions by filtering df before groupby to aggregate only the desired positions:

    whitelisted_positions = ['Kicker', 'Defense']
    df = df.append(
        df[df['Position'].isin(whitelisted_positions)]
            .groupby('Team', as_index=False).agg({
            'Position': ' + '.join,  # Concat Strings together
            'Points': 'sum'  # Total Points
        }), ignore_index=True
    )