Search code examples
pythonrgroup-bydatatablepy-datatable

Python datatable: sum, groupby, column < 0


Hi I am struggling to translate some R code into Python code.

This is my R code:

  df_sum <- df[, .(
    Inflow = sum(subset(Amount, Amount>0)),
    Outflow = sum(subset(Amount, Amount<0)),
    Net = sum(Amount)
  ), by = Account]

This is my Python code so far:

df_sub = df[:, {'Inflow': dt.sum(dt.f.Amount),
                'Outflow': dt.sum(dt.f.Amount),
                'Net': dt.sum(dt.f.Amount)},
         dt.by('Account')]

I don't know how to include the subset for the inflow and outflow columns. Can anyone help?

This is the desired output (generated with R code):

     Account Inflow Outflow  Net
1: Account 1    151     -32  119
2: Account 2     51    -226 -175

Sample data:

{'Account': ['Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 1', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2', 'Account 2'], 'Amount': [34, 23, -23, -4, 34, 4, -3, 56, -2, 3, 5, 43, -67, -3, -78, -7, -4, -67]}

Solution

  • Use the ifelse function to replicate your R code:

    from datatable import dt, f, by, ifelse
    
       df[:, {"Inflow": dt.sum(ifelse(f.Amount > 0, f.Amount, None)),
              "Outflow": dt.sum(ifelse(f.Amount < 0, f.Amount, None )),
              "Net": dt.sum(f.Amount)}, 
          by("Account")]
    
         Account    Inflow  Outflow Net
    0   Account1       151   −32    119
    1   Account2        51   −226  −175