Search code examples
pythonpandasdataframemaxargmax

Return label of the column which has the maximum value if it is higher than a certain number


I have a pandas dataframe with 18 columns, the columns from 2:17 hold numerical values with labels 2-17. I want to add a new column ('category') to the dataframe to include the label of the column which holds the maximum value if that value is higher than a certain number (0.5). I used the below code after utilizing a related question on #Pandas (python): max in columns define new value in new column

  test_df['Category'] = test_df.values[:,2:18].argmax(1) if (test_df.values[:,2:18].max(1) >=0.5) else 'none'

when I run the above code, I get the error message "The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()", any advice is appreciated.


Solution

  • Use mask to hide row where no values are greater than 0.5 then extract the column name with idxmax.

    test_df['Category'] = (
        test_df.iloc[:, 2:18].mask(~test_df.iloc[:, 2:18].ge(0.5).any(1), other=pd.NA)
               .idxmax(1).fillna('none')
    )
    

    Output:

    >>> test_df
           A     B      col0      col1      col2      col3      col4      col5      col6      col7      col8      col9     col10     col11     col12     col13     col14     col15 Category
    0  alpha  beta  0.004926  0.262662  0.059676  0.026302  0.360741  0.256310  0.472451  0.340764  0.472086  0.379545  0.437554  0.435562  0.438726  0.503708  0.193708  0.260441    col13
    1  alpha  beta  0.178689  0.326015  0.514489  0.050754  0.391687  0.153947  0.157198  0.396039  0.009823  0.275651  0.454966  0.204654  0.111680  0.250095  0.297196  0.183910     col2
    2  alpha  beta  0.513636  0.019905  0.418037  0.188359  0.393665  0.481321  0.196138  0.507833  0.042835  0.022343  0.156208  0.191423  0.258029  0.351852  0.354429  0.301106     col0
    3  alpha  beta  0.042417  0.472806  0.020204  0.411681  0.019293  0.140623  0.108013  0.136260  0.490850  0.004276  0.212354  0.470580  0.107478  0.011461  0.366930  0.100539     none
    4  alpha  beta  0.287596  0.317487  0.520418  0.048656  0.033835  0.519975  0.237952  0.449427  0.211287  0.204319  0.465478  0.343144  0.408842  0.050700  0.508506  0.169613     col2
    5  alpha  beta  0.154299  0.369279  0.188567  0.083248  0.004332  0.436839  0.329708  0.458652  0.329425  0.068757  0.238121  0.102302  0.497296  0.219990  0.041560  0.455992     none
    6  alpha  beta  0.361315  0.010401  0.039190  0.013335  0.236268  0.268206  0.407632  0.149902  0.041239  0.481870  0.338411  0.430095  0.335075  0.405940  0.108008  0.393766     none
    7  alpha  beta  0.087221  0.146438  0.202438  0.132151  0.126972  0.442878  0.011017  0.471010  0.229542  0.139989  0.476444  0.294640  0.057385  0.369218  0.433006  0.388687     none
    8  alpha  beta  0.519944  0.083779  0.520042  0.382255  0.315839  0.083350  0.038597  0.114521  0.160149  0.424750  0.192139  0.457243  0.054164  0.252922  0.133647  0.240518     col2
    9  alpha  beta  0.246072  0.172740  0.002171  0.046882  0.132817  0.365952  0.252149  0.516113  0.050739  0.054263  0.331012  0.493549  0.433013  0.484111  0.118902  0.475689     col7
    

    Setup MRE

    np.random.seed(2022)
    test_df = pd.DataFrame(np.random.random((10, 16))) / 1.9
    test_df.columns = 'col' + test_df.columns.astype(str)
    tmp_df = pd.DataFrame({'A': 'alpha', 'B': 'beta'}, index=test_df.index)
    test_df = pd.concat([tmp_df, test_df], axis=1)