Search code examples
pythonpandasdataframegroup-bydiff

Subtract multiple rows from dataframe


following dataframe:

df = pd.DataFrame({"country": ["Americas", "Americas","Europe", "Europe", "SouthAmerica", "SouthAmerica"], "year":[2000, 2005, 2000, 2005, 2000, 2005], "planted":[100, 200, 300, 100, 500, 300], "regrowth":[200, 400, 500, 200, 700, 100]})
        country  year  planted  regrowth
0      Americas  2000      900       300
1      Americas  2005      800       400
2        Europe  2000      300       500
3        Europe  2005      100       200
4  SouthAmerica  2000      500       200
5  SouthAmerica  2005      300       100

i want to subtract the SouthAmerica rows from the Americas rows across the planted & regrowth columns and add them as NorthAmerica at index 6&7

desired output:

        country  year  planted  regrowth
0      Americas  2000      900       300
1      Americas  2005      800       400
2        Europe  2000      300       500
3        Europe  2005      100       200
4  SouthAmerica  2000      500       200
5  SouthAmerica  2005      300       100
6  NorthAmerica  2000      400       100
7  NorthAmerica  2005      500       300

from similar questions I get, that I have to use groupby and .diff somehow but I can't wrap my head around the the mechanics and how to apply it to my problem


Solution

  • You can slice the Americas/SouthAmerica, subtract and concat:

    df1 = df.loc[df['country'].eq('Americas')].drop(columns='country').set_index('year')
    df2 = df.loc[df['country'].eq('SouthAmerica')].drop(columns='country').set_index('year')
    
    df = pd.concat([df, df1.sub(df2).reset_index().assign(country='NorthAmerica')],
                   ignore_index=True)
    

    Or:

    df2 = df.set_index(['country', 'year'])
    
    out = pd.concat([df2, pd.concat({'NorthAmerica': df2.loc['Americas'].sub(df2.loc['SouthAmerica'])})
                    ]).reset_index()
    
    

    Output:

            country  year  planted  regrowth
    0      Americas  2000      900       300
    1      Americas  2005      800       400
    2        Europe  2000      300       500
    3        Europe  2005      100       200
    4  SouthAmerica  2000      500       200
    5  SouthAmerica  2005      300       100
    6  NorthAmerica  2000      400       100
    7  NorthAmerica  2005      500       300