This is a revisit to the question Add columns to pandas dataframe containing max of each row, AND corresponding column name where a solution was provided using the now deprecated method ix
. How can you do the same thing using iloc
or loc
instead? I've tried both, but I'm getting:
IndexError: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 5
Here's a sample DataFrame:
a b c maxval
0 1 0 0 1
1 0 0 0 0
2 0 1 0 1
3 1 0 0 1
4 3 1 0 3
And here's the desired output:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a
Here ix
is used to select the columns up to c
, you can do the same with loc
:
df['maxcol'] = (df.loc[:, :'c'].eq(df['maxval'], axis=0)
.apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
)
Or, since [:3]
is used later with iloc
:
df['maxcol'] = (df.iloc[:, :3].eq(df['maxval'], axis=0)
.apply(lambda x: ','.join(df.columns[:3][x==x.max()]),axis=1)
)
A variant with a dot product:
tmp = df.loc[:, :'c']
df['maxcol'] = (tmp.eq(df['maxval'], axis=0)
@ (tmp.columns+',')).str[:-1]
Or melt
+groupby.agg
:
df['maxcol'] = (df
.melt('maxval', ignore_index=False)
.query('maxval == value')
.groupby(level=0)['variable'].agg(','.join)
)
Output:
a b c maxval maxcol
0 1 0 0 1 a
1 0 0 0 0 a,b,c
2 0 1 0 1 b
3 1 0 0 1 a
4 3 1 0 3 a