Search code examples
pythonpandasgroup-by

How to group by and get 1 max value and 2 max value rows in pandas?


Raw Data...

RefNo rest1 Pair rest2 rest3
A 11 E 55 7
B 22 F 66 6
C 33 G 77 5
C 33 H 88 4
D 44 I 99 3
D 44 J 111 2
D 44 K 222 1
  1. Groupby(RefNo) and get MaxValue(rest2) row...
RefNo rest1 Pair rest2 rest3
A 11 E 55 7
B 22 F 66 6
C 33 H 88 4
D 44 K 222 1
  1. Groupby(RefNo) and get 2 MaxValue(rest2) rows...
RefNo rest1 Pair rest2 rest3
A 11 E 55 7
B 22 F 66 6
C 33 H 88 4
C 33 G 77 5
D 44 K 222 1
D 44 J 111 2

Appreciate any help.


Solution

  • Using groupby.rank and boolean indexing:

    n = 2
    
    out = df[df.groupby('RefNo')['rest2'].rank('dense').le(n)]
    

    Output:

      RefNo  rest1 Pair  rest2  rest3
    0     A     11    E     55      7
    1     B     22    F     66      6
    2     C     33    G     77      5
    3     C     33    H     88      4
    4     D     44    I     99      3
    5     D     44    J    111      2
    

    Intermediates:

      RefNo  rest1 Pair  rest2  rest3  rank   mask
    0     A     11    E     55      7   1.0   True
    1     B     22    F     66      6   1.0   True
    2     C     33    G     77      5   1.0   True
    3     C     33    H     88      4   2.0   True
    4     D     44    I     99      3   1.0   True
    5     D     44    J    111      2   2.0   True
    6     D     44    K    222      1   3.0  False