Search code examples
pythonpandasgroup-byaggregate

python pandas groupby aggregate with min function


I have a dataframe like:

df = pd.DataFrame({'year': [2018, 2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019, 2019],
                   'label': ['A', 'B', 'C', 'A', 'B', 'D', 'A', 'E', 'F', 'G', 'E', 'E'],
                   'cat' : [236, 546, 671, 555, 871, 229, 811, 992, 227, 341, 701, 508],
                   'value': [2, 5, 6, 1, 9, 4, 7, 8, 13, 11, 3, 12]})

And I would like to group this by year and label and return for each group the row with the lowest value for value. The row has to include the (original) index value as well as all the columns (from df).

So, the result should be like:

 3  2018  A  555   1
 1  2018  B  546   5
 2  2018  C  671   6
 5  2018  D  229   4
 6  2019  A  811   7
10  2019  E  701   3
 8  2019  F  227  13
 9  2019  G  341  11

My code so far:

df.groupby(by=['year', 'label']).min().reset_index()

This has the correct values for value, but not for cat. Also the (original) index is missing.

Any suggestions how to solve this?


Solution

  • You can do with idxmin():

    df.loc[df.groupby(['year','label']).value.idxmin()]
    

    Output:

        year label  cat  value
    3   2018     A  555      1
    1   2018     B  546      5
    2   2018     C  671      6
    5   2018     D  229      4
    6   2019     A  811      7
    10  2019     E  701      3
    8   2019     F  227     13
    9   2019     G  341     11