Search code examples
pythonpandasdataframegroup-bymax

Get the row(s) which have the max value in groups using groupby


How do I find all rows in a pandas DataFrame which have the max value for count column, after grouping by ['Sp','Mt'] columns?

Example 1: the following DataFrame:

   Sp   Mt Value   count
0  MM1  S1   a     **3**
1  MM1  S1   n       2
2  MM1  S3   cb    **5**
3  MM2  S3   mk    **8**
4  MM2  S4   bg    **10**
5  MM2  S4   dgd     1
6  MM4  S2   rd      2
7  MM4  S2   cb      2
8  MM4  S2   uyi   **7**

Expected output is to get the result rows whose count is max in each group, like this:

   Sp   Mt   Value  count
0  MM1  S1   a      **3**
2  MM1  S3   cb     **5**
3  MM2  S3   mk     **8**
4  MM2  S4   bg     **10** 
8  MM4  S2   uyi    **7**

Example 2:

   Sp   Mt   Value  count
4  MM2  S4   bg     10
5  MM2  S4   dgd    1
6  MM4  S2   rd     2
7  MM4  S2   cb     8
8  MM4  S2   uyi    8

Expected output:

   Sp   Mt   Value  count
4  MM2  S4   bg     10
7  MM4  S2   cb     8
8  MM4  S2   uyi    8

Solution

  • Firstly, we can get the max count for each group like this:

    In [1]: df
    Out[1]:
        Sp  Mt Value  count
    0  MM1  S1     a      3
    1  MM1  S1     n      2
    2  MM1  S3    cb      5
    3  MM2  S3    mk      8
    4  MM2  S4    bg     10
    5  MM2  S4   dgd      1
    6  MM4  S2    rd      2
    7  MM4  S2    cb      2
    8  MM4  S2   uyi      7
    
    In [2]: df.groupby(['Sp', 'Mt'])['count'].max()
    Out[2]:
    Sp   Mt
    MM1  S1     3
         S3     5
    MM2  S3     8
         S4    10
    MM4  S2     7
    Name: count, dtype: int64
    

    To get the indices of the original DF you can do:

    In [3]: idx = df.groupby(['Sp', 'Mt'])['count'].transform(max) == df['count']
    
    In [4]: df[idx]
    Out[4]:
        Sp  Mt Value  count
    0  MM1  S1     a      3
    2  MM1  S3    cb      5
    3  MM2  S3    mk      8
    4  MM2  S4    bg     10
    8  MM4  S2   uyi      7
    

    Note that if you have multiple max values per group, all will be returned.


    Update

    On a Hail Mary chance that this is what the OP is requesting:

    In [5]: df['count_max'] = df.groupby(['Sp', 'Mt'])['count'].transform(max)
    
    In [6]: df
    Out[6]:
        Sp  Mt Value  count  count_max
    0  MM1  S1     a      3          3
    1  MM1  S1     n      2          3
    2  MM1  S3    cb      5          5
    3  MM2  S3    mk      8          8
    4  MM2  S4    bg     10         10
    5  MM2  S4   dgd      1         10
    6  MM4  S2    rd      2          7
    7  MM4  S2    cb      2          7
    8  MM4  S2   uyi      7          7