Search code examples
pythonpandas

How to add columns to a Pandas DataFrame containing max of each row, AND corresponding column name using iloc or iloc?


This is a revisit to the question Add columns to pandas dataframe containing max of each row, AND corresponding column name where a solution was provided using the now deprecated method ix. How can you do the same thing using iloc or loc instead? I've tried both, but I'm getting:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 5

Here's a sample DataFrame:

   a  b  c  maxval
0  1  0  0       1
1  0  0  0       0
2  0  1  0       1
3  1  0  0       1
4  3  1  0       3

And here's the desired output:

    a  b  c  maxval maxcol
0   1  0  0  1      a
1   0  0  0  0      a,b,c
2   0  1  0  1      b
3   1  0  0  1      a
4   3  1  0  3      a

Solution

  • Here ix is used to select the columns up to c, you can do the same with loc:

    df['maxcol'] = (df.loc[:, :'c'].eq(df['maxval'], axis=0)
                      .apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
                   )
    

    Or, since [:3] is used later with iloc:

    df['maxcol'] = (df.iloc[:, :3].eq(df['maxval'], axis=0)
                      .apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
                   )
    

    A variant with a dot product:

    tmp = df.loc[:, :'c']
    df['maxcol'] = (tmp.eq(df['maxval'], axis=0)
                    @ (tmp.columns+',')).str[:-1]
    

    Or melt+groupby.agg:

    df['maxcol'] = (df
       .melt('maxval', ignore_index=False)
       .query('maxval == value')
       .groupby(level=0)['variable'].agg(','.join)
    )
    

    Output:

       a  b  c  maxval maxcol
    0  1  0  0       1      a
    1  0  0  0       0  a,b,c
    2  0  1  0       1      b
    3  1  0  0       1      a
    4  3  1  0       3      a