Search code examples
rfunctionscopedata.tableenvironment

Programming on the data.table with "env" in a function


I am interested in joining two data.tables in a function. However, when using the new env for programming on the data.table, I am unable to join the data.tables in a function because the argument I attempt to join on does not exist, i.e. I get a "argument specifying columns received non-existing columns" error. How can I programmatically feed the matching column for joining two data.tables into a function? I provide a minimal working example of a surprising failure below.

dt.mwe.1 <- data.table(a = c(1,2,3,4,0,10))

mwe_function = function(dt, merge_var){
  dt.internal = 
    data.table(z = min(dt):max(dt)) %>% 
    .[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
  dt.internal2 = 
    data.table(z = min(dt):max(dt)) %>% 
    .[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
  dt.internal
  dt.internal[dt.internal2, on = .(mv), 
              env = list(mv = merge_var)] %>% `[`
}
# fails
mwe_function(dt = dt.mwe.1, merge_var = "a")
# also fails
mwe_function(dt = dt.mwe.1, merge_var = a)

Solution

  • Maybe I am missing your point, but what about:

    mwe_function = function(dt, merge_var){
      dt.internal = 
        data.table(z = min(dt):max(dt)) %>% 
        .[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
      dt.internal2 = 
        data.table(z = min(dt):max(dt)) %>% 
        .[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
      dt.internal
      dt.internal[dt.internal2, on = merge_var] %>% `[`
    }
    
    mwe_function(dt = dt.mwe.1, merge_var = "a")
    
    #         a
    #     <int>
    #  1:     0
    #  2:     1
    #  3:     2
    #  4:     3
    #  5:     4
    #  6:     5
    #  7:     6
    #  8:     7
    #  9:     8
    # 10:     9
    # 11:    10
    

    From the help of ?data.table:

    env: List or an environment, passed to ‘substitute2’ for
             substitution of parameters in ‘i’, ‘j’ and ‘by’ (or ‘keyby’).
             Use ‘verbose’ to preview constructed expressions.
    

    So I guess the env approach does not work on the on argument, which, however, accepts anyways strings as input.


    NSE Approach

    mwe_function = function(dt, merge_var){
      merge_var <- deparse(substitute(merge_var))
      dt.internal = 
        data.table(z = min(dt):max(dt)) %>% 
        .[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
      dt.internal2 = 
        data.table(z = min(dt):max(dt)) %>% 
        .[ , .(mv = z) , env = list(mv = merge_var)] %>% `[`
      dt.internal
      dt.internal[dt.internal2, on = merge_var] %>% `[`
    }
    
    mwe_function(dt = dt.mwe.1, merge_var = a)
    
    #         a
    #     <int>
    #  1:     0
    #  2:     1
    #  3:     2
    #  4:     3
    #  5:     4
    #  6:     5
    #  7:     6
    #  8:     7
    #  9:     8
    # 10:     9
    # 11:    10