Search code examples
pythonpandasdataframemaxrounding

Flag the max value in each column of a DataFrame as True and the rest as False


I have a DataFrame that I am rounding. After the round, I subtract the original from the resultant. This gives me a data frame with a shape identical to the original, but which contains the amount of change the rounding operation caused.

I need to transform this into a Boolean where there is a true flag for the max of the row, and everything else in the row is false. All steps but the final one are handled with a vectorized function. But I can't seem to figure out how to vectorize the last step. This is what I am currently doing:

a = pd.DataFrame([[2.290119, 5.300725, 17.266693, 75.134857, 0.000000, 0.000000, 0.007606],
[0.000000, 7.560276, 55.579175, 36.858266, 0.000000, 0.000000, 0.002284],
[0.001574, 15.225538, 39.309742, 45.373800, 0.000951, 0.001198, 0.087197],
[0.000000, 55.085390, 15.547927, 29.327661, 0.000000, 0.017691, 0.021331],
[0.000000, 66.283488, 15.636673, 17.912315, 0.000000, 0.003185, 0.164339]])

b = a.round(-1)  # round to 10's place (not 10ths)
c = b-a
round_modifier = c.apply(lambda x: x.eq(x.max()), axis="columns")
print(round_modifier)
       0      1      2      3      4      5      6
0  False  False  False   True  False  False  False
1  False  False   True  False  False  False  False
2  False   True  False  False  False  False  False
3  False   True  False  False  False  False  False
4  False  False   True  False  False  False  False

I am aware of DataFrame.idxmax(axis="columns"), which gives me the column name (of each row) where the max is found, but I can't seem to find a (pythonic) way to take that and populate the corresponding flag with a True. The lambda expression I'm using gives the correct result, but I'm hoping for a faster method.

For anyone wondering, the use case is that I want to round the values in the original data frame to the tens place, such that they sum to 100. I have pre-scaled this data so it should be close, but the rounding can cause the sum to come to 90 or 110. I intend to use this T/F matrix to decide which rounded value caused the most delta, then round it in the opposite direction since this is the minimum impact method with which to coerce the series to properly sum to 100 in chunks of 10.


Solution

  • You can use idxmax to get the position of column with the max value, and use numpy broadcasting to match the position with the column.

    m = c.columns.to_numpy() == c.idxmax(axis=1).to_numpy()[:, None]
    new_df = pd.DataFrame(np.where(m, True, False), columns=c.columns)
    

    End result:

        0     1     2     3     4     5     6
    False False False  True False False False
    False False  True False False False False
    False  True False False False False False
    False  True False False False False False
    False False  True False False False False