I have a dataframe which looks like the following:
column_1 column_2
2 3
6 4
5 5
.
.
.
I would like to return for each row of the above dataframe the name(s) of the column(s) which have the maximum value in that row; for example for the first row I want to return 'column_2', and for the third row it will return both column names since they both have the same value.
I have tried to achieve it by using Pandas's idxmax function; but this function returns only the first occurrence and ignores the rest of the possibilities; like in the example of the third row above, idxmax returns only 'column_1' instead of ['column_1', 'column_2'].
Any idea how to solve this problem. Thanks in advance for your help.
For each row, you can check if the entry is equal to the maximum of that row; this will form a boolean frame. Then you can dot
product it with the column names to choose those columns' names that gave True
for the rows:
is_max = df.eq(df.max(axis=1), axis=0)
result = is_max.dot(df.columns + " ")
where axis=1
of max
says take the maximum of each row and axis=0
of eq
says align the argument (i.e., df.max(axis=1)
) to compare row-wise i.e., broadcast so),
to get
>>> is_max
column_1 column_2
0 False True
1 True False
2 True True
>>> result
0 column_2
1 column_1
2 column_1 column_2