python r group-by datatable py-datatable

Python datatable: sum, groupby, column < 0

Hi I am struggling to translate some R code into Python code.

This is my R code:

  df_sum <- df[, .(
    Inflow = sum(subset(Amount, Amount>0)),
    Outflow = sum(subset(Amount, Amount<0)),
    Net = sum(Amount)
  ), by = Account]

This is my Python code so far:

df_sub = df[:, {'Inflow': dt.sum(dt.f.Amount),
                'Outflow': dt.sum(dt.f.Amount),
                'Net': dt.sum(dt.f.Amount)},
         dt.by('Account')]

I don't know how to include the subset for the inflow and outflow columns. Can anyone help?

This is the desired output (generated with R code):

     Account Inflow Outflow  Net
1: Account 1    151     -32  119
2: Account 2     51    -226 -175

Sample data:

{'Account': ['Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2'], 'Amount': [34, 23, -23, -4, 34, 4, -3, 56, -2, 3, 5, 43, -67, -3, -78, -7, -4, -67]}

Solution

Use the ifelse function to replicate your R code:

from datatable import dt, f, by, ifelse

   df[:, {"Inflow": dt.sum(ifelse(f.Amount > 0, f.Amount, None)),
          "Outflow": dt.sum(ifelse(f.Amount < 0, f.Amount, None )),
          "Net": dt.sum(f.Amount)}, 
      by("Account")]

     Account    Inflow  Outflow Net
0   Account1       151   −32    119
1   Account2        51   −226  −175