I am following this answer however, I am unable to get the correct syntax for my purpose
df.head()
country league Win DNB O 1.5 U 4.5
113 Iceland Urvalsdeild Women 3.19 3.43 4.89 2.10
135 Belgium Jupiler League 1.99 1.99 3.59 2.40
165 Brazil Serie D 1.71 1.98 3.80 1.90
238 Czech Republic U19 League 2.90 2.90 4.70 2.25
244 China Jia League 2.42 0.94 4.80 2.00
I want for country
, league
max of ('Win', 'DNB', 'O 1.5')
, min of ('U 4.5')
I am trying
df= df.groupby('country', 'league).agg({'Win':'max', 'DNB':'max', 'O 1.5':'max', 'U 4.5': 'min'})[['Win', 'DNB', 'O 1.5', 'U 4.5']].reset_index()
However I am getting SyntaxError: invalid syntax
What would be the correct syntax?
The syntax error is because there a missing quotation mark ('
) after 'league
. However, if you fix that issue, you'll get another error:
ValueError: No axis named league for object type DataFrame
The error is because passing df.groupby('country', 'league')
is equivalent to df.groupby(by='country', axis='league')
(the first case is passing positional arguments and the second case is passing keyword arguments).
Your question indicates that you want to use both country and league as grouper, so indicate it as such using square brackets ([]
).
FYI, groupby
also has as_index
argument that is True
by default. If you're going to reset_index
later, pass as_index
as False
from the beginning so you don't need to reset_index
.
Also, groupby.agg(...)
outputs a dataframe whose columns are only those passed to agg
, so no need to index them again like [[...]]
.
The final code:
df.groupby(['country', 'league'], as_index=False).agg({'Win':'max', 'DNB':'max', 'O 1.5':'max', 'U 4.5': 'min'})