Search code examples
rdata.tabler-package

How to import data.table development version hosted on github into an r-package?


Outline

data.table development version v1.14.9 provides the env argument - a new interface for programming on the language with data.table (see this vignette). This is especially useful for deploying data.table within functions. I want to use this interface within a package that I am actually developing. But I don't manage to import the data.table development version into the package.

What I tried

  • I updated data.table with data.table::update_dev_pkg() and tested a function with the new env argument - everything works fine.
  • I tried to use the very same function within package via load_all() - I get an error.
  • I added Remotes: github::Rdatatable/data.table to the DESCRIPTION file as suggested here, here and here
  • I included the Additional_repositories: https://Rdatatable.gitlab.io/data.table to the DESCRIPTION file as suggested here.
  • I changed it to additional_repositories: https://github.com/Rdatatable/data.table.git after reading this and this
  • I also read what Wickham writes in his R-packages book on importing nonstandard dependencies (see this) - what he says ist roughly similar to the point I mentioned on the Remotes-field above.

But no avail. I'm not sure, what exactly goes into these fields to get the latest data.table development version into my package.

Minimal example

The following function replaces numerical codes with character-values according to a definition-file (.lbl) that is deployed to a data-file (.dtt).

# libraries (for interactive use only, do not deploy inside package)
data.table::update_dev_pkg()
library(purrr)
library(data.table)

# function (use this inside the package)
make_labels <-
    function(.dtt,.lbl){
        f <- 
            function(.dtt,clm,val,lbl){
                .dtt[
                    ,clm := as.character(clm)
                    ,env = list(clm=clm)
                ][
                     clm == val
                    ,clm := lbl
                    ,env = list(clm=clm,val=val,lbl=I(lbl))
                ]
            }
        purrr::pwalk(.lbl,f,.dtt)
    }

#sample data
dtt <-
    data.table::data.table(
         v1 = rep(1:2,5)
        ,v2 = rep(1:5,2)
    )        
lbl <-
    data.table::data.table(
         clm = c(rep("v1",2),rep("v2",5))
        ,val = c(1:2,1:5)
        ,lbl = letters[1:7]
    )

#deploy function
make_labels(dtt,lbl)

This works without any complaints, when the function is loaded interactively. However, when ...

  • data.table is imported via the NAMESPACE file import mechanism (by adding Remotes: github::Rdatatable/data.table to the DESCRIPTION file) and subsequently
  • the function is loaded by devtools::load_all() within the package

R throws the following error:

> make_labels(dtt,lbl)Error in `pmap()`:
ℹ In index: 1.
Caused by error:
! Check that is.data.table(DT) == TRUE. Otherwise, :=, `:=`(...) 
and let(...) are defined for use in j, once only and in particular 
ways. See help(":=").
Run `rlang::last_trace()` to see where the error occurred.

Solution

  • I found the solution myself. Following steps make the error go away:

    • Put Remotes: Rdatatable/data.table to the DESCRIPTION file. This way the package imports the data.table development version automatically from github (no need to declare github explicitly in the DESCRIPTION file, it's the default).
    • Put #' @import data.table on top of the function file and execute devtools::document(). This way roxygen automatically adds library(data.table) to the NAMESPACE file.

    I found the solution finally by reading a data.table vignette on importing data.table into a package (here) as well as two stackoverflow-posts on that topic (here and here).

    I successfully used data.table within packages before, so I thought the error is (solely) based on loading the development version into the package but infact there was a second problem: the missing NAMESPACE entry via roxygen.