Search code examples
pythondataframegroup-bymaxmin

What is the correct syntax for groupby max min


I am following this answer however, I am unable to get the correct syntax for my purpose

df.head()
            country             league   Win   DNB  O 1.5  U 4.5
113         Iceland  Urvalsdeild Women  3.19  3.43   4.89   2.10
135         Belgium     Jupiler League  1.99  1.99   3.59   2.40
165          Brazil            Serie D  1.71  1.98   3.80   1.90
238  Czech Republic         U19 League  2.90  2.90   4.70   2.25
244           China         Jia League  2.42  0.94   4.80   2.00

I want for country, league max of ('Win', 'DNB', 'O 1.5'), min of ('U 4.5')

I am trying

df= df.groupby('country', 'league).agg({'Win':'max', 'DNB':'max', 'O 1.5':'max', 'U 4.5': 'min'})[['Win', 'DNB', 'O 1.5', 'U 4.5']].reset_index()

However I am getting SyntaxError: invalid syntax

What would be the correct syntax?


Solution

  • The syntax error is because there a missing quotation mark (') after 'league. However, if you fix that issue, you'll get another error:

    ValueError: No axis named league for object type DataFrame
    

    The error is because passing df.groupby('country', 'league') is equivalent to df.groupby(by='country', axis='league') (the first case is passing positional arguments and the second case is passing keyword arguments).

    Your question indicates that you want to use both country and league as grouper, so indicate it as such using square brackets ([]).

    FYI, groupby also has as_index argument that is True by default. If you're going to reset_index later, pass as_index as False from the beginning so you don't need to reset_index.

    Also, groupby.agg(...) outputs a dataframe whose columns are only those passed to agg, so no need to index them again like [[...]].

    The final code:

    df.groupby(['country', 'league'], as_index=False).agg({'Win':'max', 'DNB':'max', 'O 1.5':'max', 'U 4.5': 'min'})