My colleagues and I routinely create ad hoc scripts in R, to perform ETL on proprietary data and generate automated reports for clients. I am attempting to standardize our approach, for the sake of consistency, modularity, and reusability.
In particular, I want to consolidate our most commonly used functions in a central directory, and to access them as if they were functions from a proprietary R package. However, I am quite raw as an R developer, and my teammates are even less experienced in R development. As such, the development of a formal package is unfeasible for the moment.
Fortunately, the box
package, by Stack Overflow's very own Konrad Rudolph, provides (among other modularity) an accessible approach to approximate the behavior of an R package. Unlike the rigorous development process outlined by the RStudio team, box
requires only that one create a regular .R
file, in a meaningful location, with roxygen2
documentation (#'
) and explicit @export
s:
Writing modules
The module
bio/seq
, which we have used in the previous section, is implemented in the filebio/seq.r
. The fileseq.r
is, by and large, a normal R source file, which happens to live in a directory namedbio
.In fact, there are only three things worth mentioning:
Documentation. Functions in the module file can be documented using ‘roxygen2’ syntax. It works the same as for packages. The ‘box’ package parses the
documentation and makes it available viabox::help
. Displaying module help requires that ‘roxygen2’ is installed.Export declarations. Similar to packages, modules explicitly need to declare which names they export; they do this using the annotation comment
#' @export
in front of the name. Again, this works similarly to ‘roxygen2’ (but does not require having that package installed).⋮
At the moment, I am tinkering around with a particular module, as "imported" into a script. While the "import" itself works seamlessly, I cannot seem to access the documentation for my functions.
I am experimenting with box
on a Lenovo ThinkPad running Windows 10 Enterprise. I have created a script, aptly titled Script.R
, whose location serves as my working directory. My module exists in the relative subdirectory ./Resources/Modules
as the humble file time.R
, reproduced here:
###########################
## Relative Date Windows ##
###########################
#' @title Past Day of Week
#' @description Determine the date of the given weekday that fell a given number
#' of weeks before the given date.
#' @param from \code{Date} object. The point of reference, from which we go
#' backwards. Defaults to current \code{Sys.Date()}.
#' @param back \code{integer}. The number of weeks to go backward from the point
#' of reference; negative values go forward. Defaults to \code{1}, for last
#' week. Weeks begin on \code{"Monday"}.
#' @param weekday \code{character}. The weekday within the week targeted by
#' \code{back}; one of \code{c("Monday", "Tuesday", "Wednesday", "Thursday",
#' "Friday", "Saturday", "Sunday")}.
#' @export
#' @return The date of the \code{weekday} falling in the week \code{back} weeks
#' prior to the week in which \code{from} falls. Defaults to \code{"Monday"}.
past_weekday <- function(from = Sys.Date(),
back = 1,
weekday = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
) {
cycle <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
from <- as.Date(from)
back <- as.integer(back)
weekday_index <- (which(cycle == weekday[1]) - 1) %% 7
from_index <- (which(cycle == "Sunday") + as.POSIXlt(from)$wday - 1) %% 7
weekdate <- as.Date(from) - lubridate::days(from_index) - lubridate::weeks(as.numeric(back)) + lubridate::days(weekday_index)
return(as.Date(weekdate))
}
Observe the roxygen2
documentation, as indicated by the special #'
comments and @
tags. The documentation for past_weekday()
is of far greater interest to me than the function itself.
Last and certainly least, here is reproduced Script.R
itself:
# Set the working directory to the location of this very script.
setwd(this.path::this.dir())
# Access the functions in 'time.R' by relative location.
box::use(Resources/Modules/time)
# Run the function with its default values.
time$past_weekday()
# View the help page for the function.
box::help(time$past_weekday)
In theory, that final line will display the documentation for past_weekday()
, via box::help()
:
The box
vignette gives a simple example to that effect:
We can also display the interactive help for individual names using the
box::help
function, e.g.:box::help(seq$revcomp)
The first three lines of Script.R
give me exactly what I desire. That is, they load time
into R as an environment, from which I can access past_weekday()
via time$past_weekday()
. This module$function()
syntax is analogous to the qualification of functions from formal packages: package::function()
. Indeed, past_weekday()
itself works just as expected:
time$past_weekday()
# [1] "2021-07-19"
However, when I attempt to interactively access the documentation
box::help(time$past_weekday)
the console displays the following warnings
Warning messages:
1: In utils::packageDescription(package, fields = "Version") :
no package 'PKG' was found
2: In file.create(to[okay]) :
cannot create file 'C:\Users\greg\AppData\Local\Temp\RtmpYBTTyG/.R/doc/html/module:Resources/Modules/time.html', reason 'Invalid argument'
and the interactive help window is empty but for this error message:
For my team, this could prove a serious issue. Since we often rely on useful functions written by each other, it is crucial that any user on our team be able to easily access clear documentation by the author of the function...just as the user is accustomed to doing with formal R packages. Without this ability, the user must either bug the author for clarification, or blunder ahead without a clear understanding of the function's purpose and limitations.
When I read the warning
In file.create(to[okay]) :
cannot create file 'C:\Users\greg\AppData\Local\Temp\RtmpYBTTyG/.R/doc/html/module:Resources/Modules/time.html', reason 'Invalid argument'
I was drawn to the filepath
C:\Users\greg\AppData\Local\Temp\RtmpYBTTyG/.R/doc/html/module:Resources/Modules/time.html
as the cause for an Invalid argument
to file.create()
. To my knowledge, a directory name .../module:Resources/...
containing a colon :
is illegal on Windows and elsewhere.
Indeed, when I supply another illegal filepath ./illegal:directory:name/missing.txt
to file.create()
file.create('./illegal:directory:name/missing.txt')
# [1] FALSE
I get the same warning:
Warning message:
In file.create("./illegal:directory:name/missing.txt") :
cannot create file './illegal:directory:name/missing.txt', reason 'Invalid argument'
The culprit appears to be this line in help.R
:
display_help(doc, paste0('module:', mod_name), help_type)
# ^
# Here
However, this seems far too simple a diagnosis. Frankly, I would be quite surprised to find such a portability issue within a package designed by a seasoned developer. I find it overwhelmingly more likely that I am simply out of my depth.
I tried it on my MacBook Air, running Mojave, and it actually worked! While I still got the first (rather odd) warning message on the console
Warning message:
In utils::packageDescription(package, fields = "Version") :
no package 'PKG' was found
the interactive help window does display the intended documentation:
Naturally, this does not exactly solve my problem—the scripts will be executed on a VM running Windows, just like my Lenovo and every other computer used at my company. However, it does support the hypothesis that this issue is specific to box
on Windows.
Konrad has kindly confirmed that this is indeed a bug, and he's working on a fix. Many thanks to Konrad for his clarification and responsiveness!
As noted, that’s a bug, now fixed.
But since we’re here, a word on usage:
# Set the working directory to the location of this very script. setwd(this.path::this.dir())
This is generally not recommended. To quote Jenny Bryan:
If the first line of your R script is
setwd("C:\Users\jenny\path\that\only\I\have")
I will come into your office and SET YOUR COMPUTER ON FIRE 🔥.
‘box’ also doesn’t need this; instead, the idea is to configure a global module search path (equivalent to R’s package library, see .libPaths()
), e.g. via the box.path
option (this would usually go into the user’s .Rprofile
configuration — not in the script itself!):
options(box.path = 'C:\User\Konrad\some\path')
Afterwards, modules that are installed in this search path will be found by box::use
.
As noted, this should be a global setting. To use project-specific modules, you wouldn’t set a global search path; instead, you’d use relative imports:
box::use(./Resources/Modules/time)
This should work regardless of the working directory; it uses the calling script’s location instead. Consequently, this.path::this.dir()
or similar hacks are never necessary with ‘box’. And to find data files, ‘box’ provides the box::file
function which also works regardless of the current working directory.