Search code examples
rdata.tabler-s3

How to adjust j in `[.data.table` without breaking data.table's custom evaluation?


I'm trying to extend data.table to speed up/standardize analyses of complex survey designs. To do so, I'm trying to add a light layer on top of [.data.table where I intercept the call in j and in a few circumstances replace the operation (e.g. the mean to median) when special survey type commands are needed (or use normal functions when special ones are no needed to take advantage of data.table's geforce type optimizations).

Based on my partial understanding of s3 dispatch, NextMethod should be the appropriate function here, but it seems to be passing j as the symbol j (e.g. a[, j] instead of a[, median(v1)] which interacts weirdly with data.table's NSE. I've tried versions with do.call, but couldn't get past some infinite recursion nonsense (do.call('[', ...) will endlessly dispatch [.dtsurvey)

Is there a clean way to adjust the arguments and pass it on to data.table? In the toy example below, I'd like to have the call return the median of column v1 even though the initial operation is mean.

library('data.table')

a = data.table(v1 = 1:10)
b = copy(a)

"[.dtsurvey" <- function(x, i, j, by, ...){
  
  j = substitute(j)
  print(j)
  if(j[[1]] == 'mean') j[[1]] = quote(median)
  print(j)

  NextMethod(`[`, x)
}
class(a) <- c('dtsurvey', class(a))
a[, mean(v1)]
#> mean(v1)
#> median(v1)
#> Error in `[.data.table`(a, , mean(v1)): j (the 2nd argument inside [...]) is a single symbol but column name 'j' is not found. Perhaps you intended DT[, ..j]. This difference to data.frame is deliberate and explained in FAQ 1.1.

Created on 2020-10-08 by the reprex package (v0.3.0)


Solution

  • I don't think you'll be able to leverage NextMethod here, as far as I understand it considers the arguments as they were passed. Here's a way to do it :

    library(data.table)
    a = data.table(v1 = c(1,2,9))
    b = copy(a)
    
    "[.dtsurvey" <- function(x, i, j, by, ...){
      mc <- match.call()
      j <- substitute(j)
      j <- do.call(substitute, list(j, list(mean = quote(median))))
      mc[["j"]] <- j
      mc[[1]] <- quote(data.table:::`[.data.table`)
      eval.parent(mc)
    }
    
    class(a) <- c('dtsurvey', class(a))
    a[, mean(v1)]
    #> [1] 2
    b[, mean(v1)]
    #> [1] 4
    

    Created on 2020-10-08 by the reprex package (v0.3.0)

    Alternately :

    "[.dtsurvey" <- function(x, i, j, by, ...){
      mc <- match.call()
      mc[["j"]] <- do.call(substitute, list(substitute(j), list(mean = quote(median))))
      mc[[1]] <- quote(`[`)
      mc[[2]] <- substitute(as.data.table(x))
      eval.parent(mc)
    }