I have the following DataFrame:
df = pd.DataFrame({'a': [0.28, 0, 0.25, 0.85, 0.1],
'b': [0.5, 0.5, 0, 0.75, 0.1],
'c': [0.33, 0.7, 0.25, 0.2, 0.5],
'd': [0, 0.25, 0.2, 0.66, 0.1]})
Output:
a b c d
0 0.28 0.50 0.33 0.00
1 0.00 0.50 0.70 0.25
2 0.25 0.00 0.25 0.20
3 0.85 0.75 0.20 0.66
4 0.10 0.10 0.50 0.10
For each column, I want to sum the top n
max values of the column, where n
is determined by how many row max values that column contains.
For example, column b
has a row max only in row 1, so its sum is the sum of the top 1 max values in that column, which is just 0.5
-- but column c
has three row-maxes, located in rows 1, 2, and 4, so the top 3 max values of column c
should be summed.
Expected output:
a b c d
0 0.28 0.50 0.33 0.00
1 0.00 0.50 0.70 0.25
2 0.25 0.00 0.25 0.20
3 0.85 0.75 0.20 0.66
4 0.10 0.10 0.50 0.10
count 1.10 0.50 1.45 0.00
where
df.append(
df.where( # only look at values that are max for the row
df.eq( # compare max values to all values in row just
# in case there are more than 1
df.max(axis=1), # actually get max values
axis=0
)
).sum().rename('count')
)
a b c d
0 0.28 0.50 0.33 0.00
1 0.00 0.50 0.70 0.25
2 0.25 0.00 0.25 0.20
3 0.85 0.75 0.20 0.66
4 0.10 0.10 0.50 0.10
count 1.10 0.50 1.45 0.00