Search code examples
pythonpython-3.xpandasdataframemax

Column name of maximum of each row in a dataframe


I have a dataframe and i wanted a column filled with maximum value of each row so i used this :

 df_1['Highest_Rew_patch'] = df_1.max(axis=1)

output:

    Patch_0  Patch_1  Patch_2  ...  Patch_7  exp_patch  Highest_Rew_patch
0      0.0     70.0     70.0  ...      0.0          3               70.0
1      0.0     74.0     74.0  ...      0.0          4               74.0
2      0.0     78.0     78.0  ...      0.0          4               78.0
3      0.0     82.0     82.0  ...      0.0          4               82.0
4      0.0     82.0     82.0  ...      0.0          5               82.0
5      0.0     86.0     86.0  ...      0.0          6               86.0
6      0.0     90.0     90.0  ...      0.0          6               90.0
7      0.0     94.0     94.0  ...      0.0          6               94.0
8      0.0     98.0     98.0  ...      0.0          6               98.0
9      0.0     98.0     98.0  ...      0.0          7               98.0

However i want a little different result:

Patch_0  Patch_1  Patch_2   Patch_7  exp_patch  Highest_Rew_patch
0        0.0      70.0      70.0       3            Patch_2,Patch_7...
1        0.0      74.0      74.0       4            Patch_2,Patch_7...

So instead of the highest value of the row , i want the header of that column for that particular row which has highest value(s).


Solution

  • You can select the columns starting by Patch and then just keep the column names where the value is equal to the max:

    >> s = df.iloc[:, df.columns.str.startswith('Patch')].apply(
       lambda s: s.index[s.eq(s.max())].tolist(), axis=1)
    
    >> s 
    
    0    [Patch_1, Patch_2]
    1    [Patch_1, Patch_2]
    2    [Patch_1, Patch_2]
    3    [Patch_1, Patch_2]
    4    [Patch_1, Patch_2]
    5    [Patch_1, Patch_2]
    6    [Patch_1, Patch_2]
    7    [Patch_1, Patch_2]
    8    [Patch_1, Patch_2]
    9    [Patch_1, Patch_2]
    

    Or joining as string:

    >> s = s.apply(lambda s: ','.join(s))
    
    >> s
    0    Patch_1,Patch_2
    1    Patch_1,Patch_2
    2    Patch_1,Patch_2
    3    Patch_1,Patch_2
    4    Patch_1,Patch_2
    5    Patch_1,Patch_2
    6    Patch_1,Patch_2
    7    Patch_1,Patch_2
    8    Patch_1,Patch_2
    9    Patch_1,Patch_2
    dtype: object
    

    Then just assign the new column:

    df['Highest_Rew_patch'] = s
    

    Update

    To select other order than the max, we can use nlargest, take advantage that it remove duplicates and then select the last element:

    >> n_order = 2
    >> s = df.iloc[:, df.columns.str.startswith('Patch')].apply(
          lambda s: s.index[s.eq(s.nlargest(n_order)[-1])].tolist(), axis=1)
    
    >> s
    0    [Patch_1, Patch_2]
    1    [Patch_1, Patch_2]
    2    [Patch_1, Patch_2]
    3    [Patch_1, Patch_2]
    4    [Patch_1, Patch_2]
    5    [Patch_1, Patch_2]
    6    [Patch_1, Patch_2]
    7    [Patch_1, Patch_2]
    8    [Patch_1, Patch_2]
    9    [Patch_1, Patch_2]
    dtype: object