I have a pandas dataframe with 18 columns, the columns from 2:17 hold numerical values with labels 2-17. I want to add a new column ('category') to the dataframe to include the label of the column which holds the maximum value if that value is higher than a certain number (0.5). I used the below code after utilizing a related question on #Pandas (python): max in columns define new value in new column
test_df['Category'] = test_df.values[:,2:18].argmax(1) if (test_df.values[:,2:18].max(1) >=0.5) else 'none'
when I run the above code, I get the error message "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", any advice is appreciated.
Use mask
to hide row where no values are greater than 0.5 then extract the column name with idxmax
.
test_df['Category'] = (
test_df.iloc[:, 2:18].mask(~test_df.iloc[:, 2:18].ge(0.5).any(1), other=pd.NA)
.idxmax(1).fillna('none')
)
Output:
>>> test_df
A B col0 col1 col2 col3 col4 col5 col6 col7 col8 col9 col10 col11 col12 col13 col14 col15 Category
0 alpha beta 0.004926 0.262662 0.059676 0.026302 0.360741 0.256310 0.472451 0.340764 0.472086 0.379545 0.437554 0.435562 0.438726 0.503708 0.193708 0.260441 col13
1 alpha beta 0.178689 0.326015 0.514489 0.050754 0.391687 0.153947 0.157198 0.396039 0.009823 0.275651 0.454966 0.204654 0.111680 0.250095 0.297196 0.183910 col2
2 alpha beta 0.513636 0.019905 0.418037 0.188359 0.393665 0.481321 0.196138 0.507833 0.042835 0.022343 0.156208 0.191423 0.258029 0.351852 0.354429 0.301106 col0
3 alpha beta 0.042417 0.472806 0.020204 0.411681 0.019293 0.140623 0.108013 0.136260 0.490850 0.004276 0.212354 0.470580 0.107478 0.011461 0.366930 0.100539 none
4 alpha beta 0.287596 0.317487 0.520418 0.048656 0.033835 0.519975 0.237952 0.449427 0.211287 0.204319 0.465478 0.343144 0.408842 0.050700 0.508506 0.169613 col2
5 alpha beta 0.154299 0.369279 0.188567 0.083248 0.004332 0.436839 0.329708 0.458652 0.329425 0.068757 0.238121 0.102302 0.497296 0.219990 0.041560 0.455992 none
6 alpha beta 0.361315 0.010401 0.039190 0.013335 0.236268 0.268206 0.407632 0.149902 0.041239 0.481870 0.338411 0.430095 0.335075 0.405940 0.108008 0.393766 none
7 alpha beta 0.087221 0.146438 0.202438 0.132151 0.126972 0.442878 0.011017 0.471010 0.229542 0.139989 0.476444 0.294640 0.057385 0.369218 0.433006 0.388687 none
8 alpha beta 0.519944 0.083779 0.520042 0.382255 0.315839 0.083350 0.038597 0.114521 0.160149 0.424750 0.192139 0.457243 0.054164 0.252922 0.133647 0.240518 col2
9 alpha beta 0.246072 0.172740 0.002171 0.046882 0.132817 0.365952 0.252149 0.516113 0.050739 0.054263 0.331012 0.493549 0.433013 0.484111 0.118902 0.475689 col7
Setup MRE
np.random.seed(2022)
test_df = pd.DataFrame(np.random.random((10, 16))) / 1.9
test_df.columns = 'col' + test_df.columns.astype(str)
tmp_df = pd.DataFrame({'A': 'alpha', 'B': 'beta'}, index=test_df.index)
test_df = pd.concat([tmp_df, test_df], axis=1)