Search code examples
pythonpandasnumpydataframedata-analysis

Largest (n) numbers with Index and Column name in Pandas DataFrame


I wish to find out the largest 5 numbers in a DataFrame and store the Index name and Column name for these 5 values.

I am trying to use nlargest() and idxmax methods but failing to achieve what i want. My code is as below:

import numpy as np
import pandas as pd
from pandas import Series, DataFrame
df = DataFrame({'a': [1, 10, 8, 11, -1],'b': [1.0, 2.0, 6, 3.0, 4.0],'c': [1.0, 2.0, 6, 3.0, 4.0]})

Can you kindly let me know How can i achieve this. Thank you


Solution

  • Use stack and nlargest:

    max_vals = df.stack().nlargest(5)
    

    This will give you a Series with a multiindex, where the first level is the original DataFrame's index, and the second level is the column name for the given value. Here's what max_vals looks like:

    3  a    11.0
    1  a    10.0
    2  a     8.0
       b     6.0
       c     6.0
    

    To explicitly get the index and column names, use get_level_values on the index of max_vals:

    max_idx = max_vals.index.get_level_values(0)
    max_cols = max_vals.index.get_level_values(1)
    

    The result of max_idx:

    Int64Index([3, 1, 2, 2, 2], dtype='int64')
    

    The result of max_cols:

    Index(['a', 'a', 'a', 'b', 'c'], dtype='object')