Search code examples
rr-package

How to store frequently used data or parameters within an R package?


I am authoring an R package and there are several numerical vectors that users will frequently use as arguments to various package functions. What would be the best way to store these vectors within the package so that users can easily access them?

One idea I had was to save each vector as a data file in inst/data. Then users would be able to use the data file's name in place of the vector when needed (at least, I can do this during development). I like this idea, but am not sure if this solution would violate CRAN rules/norms or cause any problems.

# To create one such vector as a data file
octants <- c(90, 135, 180, 225, 270, 315, 360, 45)
devtools::use_data(octants)
# To access this vector in usage
my_function(data, octants)

Another idea I had was to create a separate function that returns the desired vector. Then users would be able to call the appropriate function when needed. This might be better than data for some reason, but I worry about users forgetting the () after the function name.

# To create the vector within a function
octants <- function() c(90, 135, 180, 225, 270, 315, 360, 45}
# To access this vector in usage
my_function(data, octants()) # works
my_function(data, octants) # doesn't work

Does anyone have ideas on which solution would be preferable or any better alternatives?


Solution

  • I'll be honest, I spent quite a long time carefully reading the manual asking myself the same questions. Do it, it's a good idea, it's useful, and there are tools to help you. The Writing help extension manual describe in what format you can save your data, and how to follow R standards.

    What I would advice to provide data within a package is to use :

    devtools::use_data(...,internal=FALSE,overwrite=TRUE)
    

    where ... are unquoted names of the datasets you want to save.

    https://www.rdocumentation.org/packages/devtools/versions/1.13.3/topics/use_data

    You just create a file in the inst subdirectory of your package to create your datasets. My own example is there https://github.com/cran/stacomiR/blob/master/inst/config/generate_data.R

    For instance I use it to create the r_mig dataset

    #################################
    # generates dataset for report_mig
    # from the vertical slot fishway located at the estuary of the Vilaine (Brittany)
    # Taxa Liza Ramada (Thinlip grey mullet) in 2015
    ##################################
    
    #{ here some stuff necessary to generate this dataset from my package
    # and database}
    setwd("C:/workspace/stacomir/pkg/stacomir")
    devtools::use_data(r_mig,internal=FALSE,overwrite=TRUE)
    

    This will save your dataset in the appropriate format. Using internal = FALSE allows access to all users using data(). I suggest that you read the data() help file. You can use data() to access your files including when you are not in a package provided they are in a data subdirectory.

    If lib.loc and package are both NULL (the default), the data sets are searched for in all the currently loaded packages then in the ‘data’ directory (if any) of the current working directory.

    If you are using Roxygen, create an R file called data.R where you store the description of all your datasets. Below an example of the Roxygen naming of one of the datasets in the stacomiR package.

    #' Video counting of thin lipped mullet (Liza ramada) in 2015 in the Vilaine (France)
    #' 
    #' This dataset corresponds to the data collected at the vertical slot fishway
    #' in 2015, video recording of the thin lipped mullet Liza ramada migration
    #'
    #' @format An object of class report_mig with 8 slots:
    #' \describe{
    #'   \item{dc}{the \code{ref_dc} object with 4 slots filled with data corresponding to the iav postgres schema}
    #'   \item{taxa}{the \code{ref_taxa} the taxa selected}
    #'   \item{stage}{the \code{ref_stage} the stage selected}
    #'   \item{timestep}{the \code{ref_timestep_daily} calculated for all 2015}
    #'   \item{data}{ A dataframe with 10304 rows and 11 variables
    #'          \describe{
    #'              \item{ope_identifiant}{operation id}
    #'              \item{lot_identifiant}{sample id}
    #'              \item{lot_identifiant}{sample id}
    #'              \item{ope_dic_identifiant}{dc id}
    #'              \item{lot_tax_code}{species id}
    #'              \item{lot_std_code}{stage id}
    #'              \item{value}{the value}
    #'              \item{type_de_quantite}{either effectif (number) or poids (weights)}
    #'              \item{lot_dev_code}{destination of the fishes}
    #'              \item{lot_methode_obtention}{method of data collection, measured, calculated...} 
    #'              }
    #'   }
    #'   \item{coef_conversion}{A data frame with 0 observations : no quantity are reported for video recording of mullets, only numbers}
    #'   \item{time.sequence}{A time sequence generated for the report, used internally}
    #' }
    #' @keywords data
    "r_mig"
    

    The full file is there :

    https://github.com/cran/stacomiR/blob/master/R/data.R

    Another example : read : http://r-pkgs.had.co.nz/data.html#documenting-data

    Then you can use those data in tests like following, by calling data("r_mig")

    test_that("Summary method works",
        {
         ... #some other code
    
          data("r_mig")
          r_mig<-calcule(r_mig,silent=TRUE)
          summary(r_mig,silent=TRUE)
          rm(list=ls(envir=envir_stacomi),envir=envir_stacomi)
        })
    

    Most importantly you can use those in the manuals to describe how to use functions in your package.