Hi I am struggling to translate some R code into Python code.
This is my R code:
df_sum <- df[, .(
Inflow = sum(subset(Amount, Amount>0)),
Outflow = sum(subset(Amount, Amount<0)),
Net = sum(Amount)
), by = Account]
This is my Python code so far:
df_sub = df[:, {'Inflow': dt.sum(dt.f.Amount),
'Outflow': dt.sum(dt.f.Amount),
'Net': dt.sum(dt.f.Amount)},
dt.by('Account')]
I don't know how to include the subset for the inflow and outflow columns. Can anyone help?
This is the desired output (generated with R code):
Account Inflow Outflow Net
1: Account 1 151 -32 119
2: Account 2 51 -226 -175
Sample data:
{'Account': ['Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2'], 'Amount': [34, 23, -23, -4, 34, 4, -3, 56, -2, 3, 5, 43, -67, -3, -78, -7, -4, -67]}
Use the ifelse function to replicate your R code:
from datatable import dt, f, by, ifelse
df[:, {"Inflow": dt.sum(ifelse(f.Amount > 0, f.Amount, None)),
"Outflow": dt.sum(ifelse(f.Amount < 0, f.Amount, None )),
"Net": dt.sum(f.Amount)},
by("Account")]
Account Inflow Outflow Net
0 Account1 151 −32 119
1 Account2 51 −226 −175