Search code examples
rpackagedevtools

R package dev: should I make data files internal or external?


(Trying again with this question to make it more clear.)

I am attempting to write a package that makes it easier to access data from a web API, and deciding whether to make lookup tables and query defaults internal or external data, as outlined in the Data chapter of R Packages.

As I understand, there are drawbacks to each. Internal data is meant for data only used by the package, invisible to users. It is added to the package with devtools::use_data(x, mtcars, internal = TRUE) which adds sysdata.rda to the R/ package folder. However, although the package "needs" the data tables, I also want my data to be visible to the users, so they can correct errors, and perhaps add additional data files by pull request to extend the capability of the package. Furthermore, since I'm dealing with multiple files, not all available at the moment, rebundling everything into R/sysdata.rda every time there's a change seems inconvenient.

An alternative would be to make the lookup tables and query defaults external data, which is added with the default internal = FALSE flag: devtools::use_data(x, mtcars), adding mtcars.rda to the data/ package folder. The advantage is that such data is clearly visible to the user, but the downside is that I don't know how to access it from within the package functions without getting an error when running devtools::check(): object 'querydefaults' not found. What is the proper way to do this?


Solution

  • You can add the dataset both as external and internal, and it resolves the issue with devtools::check(). See the RIC package as an example.