following dataframe:
df = pd.DataFrame({"country": ["Americas", "Americas","Europe", "Europe", "SouthAmerica", "SouthAmerica"], "year":[2000, 2005, 2000, 2005, 2000, 2005], "planted":[100, 200, 300, 100, 500, 300], "regrowth":[200, 400, 500, 200, 700, 100]})
country year planted regrowth
0 Americas 2000 900 300
1 Americas 2005 800 400
2 Europe 2000 300 500
3 Europe 2005 100 200
4 SouthAmerica 2000 500 200
5 SouthAmerica 2005 300 100
i want to subtract the SouthAmerica rows from the Americas rows across the planted & regrowth columns and add them as NorthAmerica at index 6&7
desired output:
country year planted regrowth
0 Americas 2000 900 300
1 Americas 2005 800 400
2 Europe 2000 300 500
3 Europe 2005 100 200
4 SouthAmerica 2000 500 200
5 SouthAmerica 2005 300 100
6 NorthAmerica 2000 400 100
7 NorthAmerica 2005 500 300
from similar questions I get, that I have to use groupby and .diff somehow but I can't wrap my head around the the mechanics and how to apply it to my problem
You can slice the Americas/SouthAmerica, subtract and concat
:
df1 = df.loc[df['country'].eq('Americas')].drop(columns='country').set_index('year')
df2 = df.loc[df['country'].eq('SouthAmerica')].drop(columns='country').set_index('year')
df = pd.concat([df, df1.sub(df2).reset_index().assign(country='NorthAmerica')],
ignore_index=True)
Or:
df2 = df.set_index(['country', 'year'])
out = pd.concat([df2, pd.concat({'NorthAmerica': df2.loc['Americas'].sub(df2.loc['SouthAmerica'])})
]).reset_index()
Output:
country year planted regrowth
0 Americas 2000 900 300
1 Americas 2005 800 400
2 Europe 2000 300 500
3 Europe 2005 100 200
4 SouthAmerica 2000 500 200
5 SouthAmerica 2005 300 100
6 NorthAmerica 2000 400 100
7 NorthAmerica 2005 500 300