Search code examples
pandasrowsum

Pandas row sum for values > 0


I have a dataframe of the following format

ROW   Value1    Value2  Value3    Value4
1      10        10      -5        -2
2      50        20      -10       -7
3      10        5        0        -1

I am looking to calculate for each row the sum of positive totals and sum of negative totals. So essentially, the resulting frame should look like

ROW   Post_Total    Neg_Total
1      20            -7
2      70            -17
3      15            -1

One thing I have in my dataset, a column can have only positive or negative values.

Any ideas on how this can be done. I tried subsetting by >0 but was not successful. Thanks!


Solution

  • You could use:

    (df.melt(id_vars='ROW')
       .assign(sign=lambda d: np.where(d['value'].gt(0), 'Pos_Total', 'Neg_Total'))
       .groupby(['ROW', 'sign'])['value'].sum()
       .unstack('sign')
    )
    

    Or alternatively, using masks.

    numpy version (faster):

    import numpy as np
    a = df.set_index('ROW').values
    mask = a > 0
    pd.DataFrame({'Pos_Total': np.where(mask, a, 0).sum(1),
                  'Neg_Total': np.where(mask, 0, a).sum(1)})
    

    pandas version (slower than numpy but faster than melt):

    d = df.set_index('ROW')
    mask = d.gt(0)
    pd.DataFrame({'Pos_Total': d.where(mask).sum(1),
                  'Neg_Total': d.mask(mask).sum(1)},
                  index=df['ROW'])
    

    output:

         Pos_Total  Neg_Total
    ROW                      
    1         20.0       -7.0
    2         70.0      -17.0
    3         15.0       -1.0