Working with a dataframe that looks like this (text version below):
I am supposed to calculate which country has scored the most goals since 2010 in tournaments. So far I have managed to manipulate the dataframe by filtering out friendlies like this:
no_friendlies = df[df.tournament != "Friendly"]
Then I set the date column to be the index in order to filter out all matches before 2010:
no_friendlies_indexed = no_friendlies.set_index('date')
since_2010 = no_friendlies_indexed.loc['2010-01-01':]
I am pretty lost from this point onward as I can't figure out how to sum goals scored by each country both home and away
Any help/advice is appreciated!
EDIT:
Text version of sample data:
date home_team away_team home_score away_score tournament city country neutral
0 1872-11-30 Scotland England 0 0 Friendly Glasgow Scotland False
1 1873-03-08 England Scotland 4 2 Friendly London England False
2 1874-03-07 Scotland England 2 1 Friendly Glasgow Scotland False
3 1875-03-06 England Scotland 2 2 Friendly London England False
4 1876-03-04 Scotland England 3 0 Friendly Glasgow Scotland False
5 1876-03-25 Scotland Wales 4 0 Friendly Glasgow Scotland False
6 1877-03-03 England Scotland 1 3 Friendly London England False
7 1877-03-05 Wales Scotland 0 2 Friendly Wrexham Wales False
8 1878-03-02 Scotland England 7 2 Friendly Glasgow Scotland False
9 1878-03-23 Scotland Wales 9 0 Friendly Glasgow Scotland False
10 1879-01-18 England Wales 2 1 Friendly London England False
EDIT 2:
I have just tried doing this:
since_2010.groupby(['home_team', 'home_score']).sum()
But it doesn't return the sum of home goals scored by the home teams (if this worked i would just repeat it for away teams to get total)
.groupby
and .sum()
for the home team and then do the same for the away team and add the two together:
df_new = df.groupby('home_team')['home_score'].sum() + df.groupby('away_team')['away_score'].sum()
output:
England 12
Scotland 34
Wales 1
More detailed explanation (per comment):
.groupby
one column home_team
. In your answer, you were grouping by ['home_team', 'home_score']
Your goal (no pun intended) is to get the .sum()
of the home_score
-- so you should NOT .groupby()
it. As you can see ['home_score']
is after the part where I use .groupby
, so that I can get the .sum()
of it. That gets you set for the home teams.away_team
.home_team
and away_team
groups have the same values for countries, you can simply add them together...