So I think I kind of know what's going on here, but I've had trouble finding a reference (in SO or the R docs) for it, so I wanted to air this out and see if people can shed any light.
I have an R package that includes the following code at the top level (not inside any function) of a utils.R
file:
S3_BUCKET <- Sys.getenv('S3_BUCKET')
CACHE_DIR <- if (S3_BUCKET == "") {
'cache/foo'
} else {
paste0('s3://', S3_BUCKET, '/dir/cache/foo')
}
print(paste("CACHE_DIR:", CACHE_DIR))
This works fine in "development mode" when I load the package via devtools::load_all('.')
, but when I install the package in my environment and load it via library(mypkg)
, Sys.getenv('S3_BUCKET')
in this code always returns the empty string (as verified by checking mypkg:::S3_BUCKET
).
My hypothesis is that environment variables aren't set up yet at the time the "package-level code" is evaluated during package load. If so - is this documented anywhere, and if it's not, where's the right place to add it to the docs? Or is it a bug that should be fixed?
It also looks like maybe stdout
isn't set up yet (for the package?) either, because the print
output never seems to appear.
The solution I'm going with is to convert it to an .onLoad
callback, which is better practice anyway I'm sure:
GLOBALS <- new.env()
GLOBALS$CACHE_DIR <- 'cache/foo'
.onLoad <- function(libname, pkgname) {
S3_BUCKET <- Sys.getenv('S3_BUCKET')
if (S3_BUCKET != "") {
assign('CACHE_DIR', paste0('s3://', S3_BUCKET, '/dir/cache/foo'), GLOBALS)
print(paste("CACHE_DIR:", CACHE_DIR))
}
}
This works as intended.
If the Sys.getenv
call is at top level, then it is not run each time your package namespace is loaded. It is run exactly once: when CRAN or you builds a binary from your package's sources via R CMD INSTALL
. If you install your package with
$ env S3_BUCKET=whatever R CMD INSTALL /path/to/package/root
then the value of S3_BUCKET
in your package namespace will be "whatever"
regardless of the value of the environment variable S3_BUCKET
when the namespace is loaded. If you want the Sys.getenv
call to be evaluated at load time, then you need to place it in the body of .onLoad
, as documented in ?.onLoad
.
The fact that source code is only evaluated at build time doesn't seem to be stated explicitly in ?utils::build
or in the Writing R Extensions manual (accessible via help.start()
). There is the slightest clue in this section of WRE:
Binary packages are compressed copies of installed versions of packages. They contain compiled shared libraries rather than C, C++ or Fortran source code, and the R functions are included in their installed form.
You would have to deduce that "R functions" really means any object defined at top level.
It is really just a software development paradigm. To build a software package is to create the objects defined in the sources and serialize them in binary format. To install a software package is to unpack the binary files somewhere in your file system. Ultimately, users obtain name-value pairs, but not the source code used to generate the values.
devtools
muddies the water a bit, because load_all
sources your .R
files in a new environment then attaches that environment to your search path. It does this on the spot, skipping the usual build and install process. That is often convenient but it can lead to headaches if you aren't aware of all of the caveats.