I have a dataframe of the following format
ROW Value1 Value2 Value3 Value4
1 10 10 -5 -2
2 50 20 -10 -7
3 10 5 0 -1
I am looking to calculate for each row the sum of positive totals and sum of negative totals. So essentially, the resulting frame should look like
ROW Post_Total Neg_Total
1 20 -7
2 70 -17
3 15 -1
One thing I have in my dataset, a column can have only positive or negative values.
Any ideas on how this can be done. I tried subsetting by >0 but was not successful. Thanks!
You could use:
(df.melt(id_vars='ROW')
.assign(sign=lambda d: np.where(d['value'].gt(0), 'Pos_Total', 'Neg_Total'))
.groupby(['ROW', 'sign'])['value'].sum()
.unstack('sign')
)
Or alternatively, using masks.
numpy version (faster):
import numpy as np
a = df.set_index('ROW').values
mask = a > 0
pd.DataFrame({'Pos_Total': np.where(mask, a, 0).sum(1),
'Neg_Total': np.where(mask, 0, a).sum(1)})
pandas version (slower than numpy but faster than melt):
d = df.set_index('ROW')
mask = d.gt(0)
pd.DataFrame({'Pos_Total': d.where(mask).sum(1),
'Neg_Total': d.mask(mask).sum(1)},
index=df['ROW'])
output:
Pos_Total Neg_Total
ROW
1 20.0 -7.0
2 70.0 -17.0
3 15.0 -1.0