Search code examples
rdevtoolsr-package

proper way to include internal and external data in custom R package


I'm creating an R package that has some included datasets that I both want to export for the user to use and to use internally in the package's function.

For example, let's say I create a dataset called measurements like this:

measurements <- data.frame(id = c(1:10), value = runif(10))
usethis::use_data(measurements, overwrite = TRUE)

That allows the measurements dataframe to be accessible externally to the user just by referencing measurements.

Now, I also want to write a package function that uses the same data frame internally:

#' fn_docalc
#' 
#' @param x Value to multiply by
#' 
#' @return Measurements dataframe multiplied by x
#' @export

fn_docalc <- function(x){
measurements$value <- measurements$value * x
measurements
}

This works fine, but the one case where it fails is if the user loads the package, and also happens to create their own variable called measurements in the global environment. If that occurs, then fn_docalc operates on that new global variable instead of on the package's variable. How can I properly write the function/package to always reference the internal measurements variable when fn_docalc is called even if a different global version of measurements exists?


Solution

  • You used usethis::use_data(measurements, overwrite = TRUE); this put your dataset in the data subdirectory. It has somewhat weird semantics.

    If you have LazyData: true in your DESCRIPTION file, then the data object is put into the exports from the package, but it is not in the internal environment that functions use. In that case your functions will need the myPkgname:: prefix.

    If you don't have that LazyData: line, or set it to false, then the data is not visible at all until you call the data() function, which by default loads it into the global environment.

    For your use case, where you want the data available both to users and to your own functions, neither of these makes sense. You want the dataset visible in both environments.

    To get it into the internal environment, you create it in one of your .R files in the R directory. For your sample data, just put

    measurements <- data.frame(id = c(1:10), value = runif(10))
    

    in one of those files. For a larger dataset you might want to store it in a compressed format somewhere (e.g. in inst/extdata), and have your .R file read it in at package install time.

    To also get it into the exports, you specify it in your NAMESPACE file, or let Roxygen do that for you, by using @export in the .R file.