Search code examples
pythonpandasdataframemax

pandas idxmax return all columns with maximum value


I have a dataframe which looks like the following:

column_1    column_2
2           3
6           4
5           5
.
.
.

I would like to return for each row of the above dataframe the name(s) of the column(s) which have the maximum value in that row; for example for the first row I want to return 'column_2', and for the third row it will return both column names since they both have the same value.

I have tried to achieve it by using Pandas's idxmax function; but this function returns only the first occurrence and ignores the rest of the possibilities; like in the example of the third row above, idxmax returns only 'column_1' instead of ['column_1', 'column_2'].

Any idea how to solve this problem. Thanks in advance for your help.


Solution

  • For each row, you can check if the entry is equal to the maximum of that row; this will form a boolean frame. Then you can dot product it with the column names to choose those columns' names that gave True for the rows:

    is_max = df.eq(df.max(axis=1), axis=0)
    result = is_max.dot(df.columns + " ")
    

    where axis=1 of max says take the maximum of each row and axis=0 of eq says align the argument (i.e., df.max(axis=1)) to compare row-wise i.e., broadcast so),

    to get

    >>> is_max
    
       column_1  column_2
    0     False      True
    1      True     False
    2      True      True
    
    >>> result
    
    0             column_2
    1             column_1
    2    column_1 column_2