How to pass "everything possible" to by in a function?

I am trying to use data.table within a user facing function in a package I'm working on. I would like this function to behave as data.table-like as possible. This means for example that my function also features a by argument, which is passed to the underlying data.table call within the function. The user should be free to pass anything into "my" by which is possible directly in a data.table.

Citing from ?data.table this includes:

  1. A single unquoted column name: e.g., DT[, .(sa=sum(a)), by=x]
  2. a list() of expressions of column names: e.g., DT[, .(sa=sum(a)), by=.(x=x>0, y)]
  3. a single character string containing comma separated column names (where spaces are significant since column names may contain spaces even at the start or end): e.g., DT[, sum(a), by="x,y,z"]
  4. a character vector of column names: e.g., DT[, sum(a), by=c("x", "y")]
  5. or of the form startcol:endcol: e.g., DT[, sum(a), by=x:z]

Here is a minimal (partially) working example to make my intent clear:

#> Warning: package 'data.table' was built under R version 3.6.2
sample_dt <- data.table(a = 1:5, b = 5:1)

count_by <- function(dt, by = NULL) {
    by <- substitute(by)
    dt[, .N, by = eval(by, dt, parent.frame())]

#>    N
#> 1: 5
count_by(sample_dt, by = a)       # refers to 1 from the list above
#>    by N
#> 1:  1 1
#> 2:  2 1
#> 3:  3 1
#> 4:  4 1
#> 5:  5 1
count_by(sample_dt, by = list(a)) # refers to 2 from the list above
#>    a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = "a")     # refers to 3 from the list above
#>    a N
#> 1: 1 1
#> 2: 2 1
#> 3: 3 1
#> 4: 4 1
#> 5: 5 1
count_by(sample_dt, by = c("a"))  # refers to 4 from the list above
#> Error in `[.data.table`(dt, , .N, by = eval(by, dt, parent.frame())): 'by' appears to evaluate to column names but isn't c() or key(). Use by=list(...) if you can. Otherwise, by=evalc("a") should work. This is for efficiency so data.table can detect which columns are needed.
count_by(sample_dt, by = a:b)     # refers to 5 from the list above
#>    a b N
#> 1: 1 5 1
#> 2: 2 4 1
#> 3: 3 3 1
#> 4: 4 2 1
#> 5: 5 1 1

Created on 2020-02-18 by the reprex package (v0.3.0)

Apart from case 4, everything works as expected using simple substitution and evaluation in the proper context. So my question is:

How can I create functions, which use data.table internally and mimic the original by user interface exactly?

Session info

  • Is there a particular reason for using eval inside the data.table? I think this would be better:

    count_by <- function(dt, by = NULL) {
      eval(substitute(dt[, .N, by = by]))

    It passes all test cases (of course). Even the first one, where your function fails with column name by.