I calculated the differences of mutliple columns based on Country and Year. The origin dataset is (only a subset, I've got all countries):
Country | Year | Continent | Inhabitants | Score1 | Score2 | Score3 | Score4 |
---|---|---|---|---|---|---|---|
Brazil | 2021 | South America | int | 6.1 | 7.2 | 4.2 | 9.2 |
Brazil | 2020 | South America | int | 6.9 | 7.0 | 4.9 | 7.2 |
Brazil | 2019 | South America | int | 5.6 | 3.4 | 2.5 | 8.4 |
Germany | 2021 | Europe | int | 5.6 | 3.4 | 2.5 | 8.4 |
Germany | 2020 | Europe | int | 5.6 | 3.4 | 2.5 | 8.4 |
Germany | 2019 | Europe | int | 5.6 | 3.4 | 2.5 | 8.4 |
Japan | 2021 | Asia | int | 5.6 | 3.4 | 2.5 | 8.4 |
Japan | 2020 | Asia | int | 5.6 | 3.4 | 2.5 | 8.4 |
Japan | 2019 | Asia | int | 5.6 | 3.4 | 2.5 | 8.4 |
I figured it out to calculate for each Score-column the differences with:
df['diff1'] = df.groupby(['Country'])['Score1'].diff()
df['diff2'] = df.groupby(['Country'])['Score2'].diff()
df['diff3'] = df.groupby(['Country'])['Score3'].diff()
df['diff4'] = df.groupby(['Country'])['Score4'].diff()
Is there an easier way to solve it instead of apply this code four times (for every column separately)? With a loop maybe?
df.groupby('Country')[['Score1', 'Score2', 'Score3' ,'Score4']].diff()