Search code examples
rdata.tabler-package

data.tables .SD behaves differently in functions that are part of an R package than when they are sourced directly


I observed the following weird behaviour of data.tables .SD-notation. Consider the function

min_ex = function(dt,bycols)
{
  dt_sub = dt[, .SD, .SDcols = bycols]
  print(dt_sub)
}

This function is supposed to take a data table dt and a character vector of column names of dt, and just print out these columns. It works exactly as expected when the function is compiled as part of a script or sourced as a standalone file:

library(data.table)

min_ex = function(dt,bycols)
{
  dt_sub = dt[, .SD, .SDcols = bycols]
  print(dt_sub)
}

dt = data.table(x = rep(1,3),y = 1:3)
min_ex(dt,'x')

This prints out the x-column in dt. However, when this function is included in an R package it behaves differently. Specifically, when my work directory is set to the package directory and there is a file in the ./R folder containing this function and I run

rm(list = ls())

devtools::load_all()
library(data.table)

dt = data.table(x = rep(1,3),y = 1:3)
min_ex(dt,'x')

It prints out "Null data.table (0 rows and 0 cols)". I put a minimal example package for this on github. This can be installed directly from github and the function then always prints out the Null data table.

I would like to understand why this function behaves differently when it is part of a package and how I can get the function to behave as expected.

What I have tried so far:

  • It does not matter whether I use devtools::load_all or install the package. The function behaves the same (prints the Null data table). It also behaves the same if you install directly from the github repo link above using devtools::install_github.
  • I used R 4.2.2 and data.table 1.14.8 and tried this both under Linux and Windows.
  • It does not depend on the function arguments being named dt and bycols. renaming them does not change the behavior.
  • It does not depend on whether data.table is loaded before or after devtools::load_all is run or the example-package is installed. Also including data.table as import in the DESCRIPTION file of the example package does not affect behaviour.
  • Even weirder: The function above does not return an error when bycols is missing, and you just run min_ex(dt). In this case, it behaves as if bycols contains all column names in the data table. I don't understand why this is the case and would expect it to return an error because an argument without default is missing.

Solution

  • As pointed out by @r2evans the problem can be solved including .datatable.aware <- TRUE in the package code.

    It is also sufficient to put data.table as dependency in the DESCRIPTION file using Depends, previously I had only tried Imports.