Search code examples
pandaspython-3.8

Get sum of all columns greater than 0 except the highest values


I have a dataframe as follows:

Group   Val1    Val2    Val3    Val4
A        -94    96         16   -92
B         30    59        -10    44
C         50    -18       -30    24
D         61    49        -15   -95

I need to find the sum of all positive values in each group except the highest value to get the following:

Group   Sum
A       16
B       74
C       24
D       49

for group A, I ignored 96 and only 16 was used to get sum of 16.

for group B, I ignored 59 and 30, 44 was used to get sum of 74

If there is only 1 positive value, we keep them we don't drop any values.

I am not sure how to do the above.


Solution

  • Try this:

    df = df.set_index('Group')
    df.where(df.ne(df.max(axis=1), axis=0) & (df > 0)).sum(1)
    

    Output:

    Group
    A    16.0
    B    74.0
    C    24.0
    D    49.0
    dtype: float64
    

    Details:

    Find max on each row, look for values in that row that are not equal to that max AND values greater than zero then sum with axis=1.

    Or..

    df.mask(df<0).sum(1) - df.max(1)
    

    Output:

    Group
    A    16.0
    B    74.0
    C    24.0
    D    49.0
    dtype: float64