Search code examples
pandasindexingargmax

List of the (row, col) of the n largest values in a numeric pandas DataFrame?


Given a Pandas DataFrame of numeric values how can one produce a list of the .loc cell locations that one can then use to then obtain the corresponding n largest values in the entire DataFame?

For example:

A B C D E
X 1.3 3.6 33 61.38 0.3
Y 3.14 2.71 64 23.2 21
Z 1024 42 66 137 22.2
T 63.123 111 1.23 14.16 50.49

An n of 3 would produce the (row,col) pairs for the values 1024, 137 and 111.

These locations could then, as usual, be fed to .loc to extract those values from the DataFrame. i.e.

df.loc['Z','A']
df.loc['Z','D']
df.loc['T','B']

Note: It is easy to mistake this question for one that involves .idxmax. That isn't applicable due to the fact that there may be multiple values selected from a row and/or column in the n largest.


Solution

  • You could try:

    >>> data = {0 : [1.3, 3.14, 1024, 63.123], 1: [3.6, 2.71, 42, 111], 2 : [33, 64, 66, 1.23], 3 : [61.38, 23.2, 137, 14.16], 4 : [0.3, 21, 22.2, 50.49] }
    >>> df = pd.DataFrame(data)
    >>> df
              0       1      2       3      4
    0     1.300    3.60  33.00   61.38   0.30
    1     3.140    2.71  64.00   23.20  21.00
    2  1024.000   42.00  66.00  137.00  22.20
    3    63.123  111.00   1.23   14.16  50.49
    >>>
    >>> a = list(zip(*df.stack().nlargest(3).index.labels))
    >>> a
    [(2, 0), (2, 3), (3, 1)]
    >>> # then ...
    >>> df.loc[a[0]]
    1024.0
    >>> 
    >>> # all sorted in decreasing order ...
    >>> list(zip(*df.stack().nlargest(20).index.labels))
    [(2, 0), (2, 3), (3, 1), (2, 2), (1, 2), (3, 0), (0, 3), (3, 4), (2, 1), (0, 2), (1, 3), (2, 4), (1, 4), (3, 3), (0, 1), (1, 0), (1, 1), (0, 0), (3, 2), (0, 4)]
    

    Edit: In pandas versions 0.24.0 and above, MultiIndex.labels has been replaced by MultiIndex.codes(see Deprecations in What’s new in 0.24.0 (January 25, 2019)). The above code will throw AttributeError: 'MultiIndex' object has no attribute 'labels' and needs to be updated as follows:

    >>> a = list(zip(*df.stack().nlargest(3).index.codes))
    >>> a
    [(2, 0), (2, 3), (3, 1)]
    

    Edit 2: This question has become a "moving target", as the OP keeps changing it (this is my last update/edit). In the last update, OP's dataframe looks as follows:

    >>> data = {'A' : [1.3, 3.14, 1024, 63.123], 'B' : [3.6, 2.71, 42, 111], 'C' : [33, 64, 66, 1.23], 'D' : [61.38, 23.2, 137, 14.16], 'E' : [0.3, 21, 22.2, 50.49] }
    >>> df = pd.DataFrame(data, index=['X', 'Y', 'Z', 'T'])
    >>> df
              A       B      C       D      E
    X     1.300    3.60  33.00   61.38   0.30
    Y     3.140    2.71  64.00   23.20  21.00
    Z  1024.000   42.00  66.00  137.00  22.20
    T    63.123  111.00   1.23   14.16  50.49
    

    The desired output can be obtained using:

    >>> a = df.stack().nlargest(3).index 
    >>> a
    MultiIndex([('Z', 'A'),
                ('Z', 'D'),
                ('T', 'B')],
               )
    >>>
    >>> df.loc[a[0]]
    1024.0