Search code examples
pythonsortingmaxdata-cleaning

How to find the index of the 2nd (or 3rd) largest value of a column in a df (Python)?


I want to find the second largest value of each column but want to collect the position where this value can be found (in short: what is the equivalent of .idxmax when using .nlargest(2).values[-1] ?)

Here is my reasoning to obtain the 2nd and 3rd highest values:

test_2ndmax = pd.DataFrame({'Col{}'.format(i):np.random.randint(0,100,5) for i in range(5)})
display(test_2ndmax)

#retrieving 2nd higest value for each column
display(test_2ndmax.apply(lambda col: col.nlargest(2).values[-1],axis=0))

#retrieving to get 3rd higest value
display(test_2ndmax.apply(lambda col: col.nlargest(3).values[-1],axis=0))

The output is as such:

    Col0    Col1    Col2    Col3    Col4
0   9       15      24      45      85
1   26      50      91      34      60
2   3       88      84      17      53
3   8       58      73      56      11
4   82      65      93      3       46

Col0    82
Col1    65
Col2    91
Col3    45
Col4    60
dtype: int32

Col0    26
Col1    58
Col2    84
Col3    34
Col4    53
dtype: int32

However, I would like to get this, as I would using an equivalent of idxmax: (exemple for col.nlargest(2).values[-1]),

Col0 4
Col1 4
Col2 1
Col3 0
Col4 1

Thank you!


Solution

  • To get index of second largest value use .nlargest(2) + .idxmin() (similar for third largest...):

    x = test_2ndmax.apply(lambda col: col.nlargest(2).idxmin(), axis=0)
    print(x)
    

    Prints:

    Col0    3
    Col1    3
    Col2    4
    Col3    4
    Col4    1
    dtype: int64
    

    DataFrame used:

       Col0  Col1  Col2  Col3  Col4
    0    64    10     6    49    94
    1     4    22    86    79    82
    2    84    92    25     1    43
    3    87    41    18    51    29
    4    96    40    73    70    74