r parallel-processing devtools parallel-foreach

Developing R package, testing with `foreach`, while running simulations at same time with different package version

I write almost all my R code in packages at work (and use git). I make heavy use of devtools, in particular short cuts for load_all, etc as I update functions used in a package. I have a rough understanding of devtools, in that load_all makes a temporary copy of the package, and I really like this workflow for testing function updates in packages.

Is there a nice easy way/workflow for running simulations depending on the package, while developing it at the same time, without "breaking" those simulations?

I suspect there is an easy solution that I've overlooked.

Right now what I do is:

get the package "mypackage" up to a point ready for running simulations. copy the whole folder containing the project. Run the simulations in the copied folder using a new package name "mypackage2"). Run simulation scripts which include library(mypackage2) but NOT library(mypackage). This annoyingly means I need to update library(mypackage) calls to library(mypackage2) calls. If I run simulations using library(mypackage) and avoid using library(mypackage2), then I need to make sure the current built version of mypackage is the 'old' one that doesn't reflect updates in 2. below (but 2. below requires rebuilding the package too!). Handling all this gets messy.
While the simulations are running in the copied folder I can update the functions in "mypackage", by either using load_all or rebuilding the package. I often need to Rebuild the package (i.e. using load_all without rebuilding the package when testing updates to the package isn't a workable solution) because I want to test functions that run small parallel simulations with doParallel and foreach, etc (on windows), and any functions I modify and want to test need the latest built "mypackage" in the children processes which spawn new R processes calling "mypackage". I understand that when a package is built in R, it gets stored in ..\R\R-3.6.1\library, and when future R sessions call library(mypackage) they will use that version of the package.

What I'd ideally like to be able to do is, in the same original folder, run simulations with a version of mypackage, and then update the code in the package while simulations are stopped/started, confident my development changes won't break the simulations which are running a specific version of the package.

Is there a simple way for doing the above, without having to recopy folders (and make things like "mypackage2")?

thanks

The issue described here is sort of similar to what I am facing Specify package location in foreach

The problem is that if I run a simulation that takes several days using "mypackage", with many calls to foreach, and update and rebuild "mypackage" when testing changes, future foreach calls from the simulation may pick up the new updated version of the package, which would be a disaster.

Solution

I think the answers in the other question do apply, but you need to do some extra steps.

Let's say you have a version of the package you want to test. You'd still create a specific folder for that version, but you leave it empty. Here I'll use /tmp/mypkg2 as an example. While having your project open in RStudio, you execute:

withr::with_libpaths(c("/tmp/mypkg2", .libPaths()), devtools::install())

That will install that version of the package to the provided folder.

You could then have a wrapper script, say wrapper.R, with something like:

pkg_path <- commandArgs(trailingOnly = TRUE)[1L]

cat("Using package at", pkg_path, "\n")

.libPaths(c(pkg_path, .libPaths()))

library(doParallel)

workers <- makeCluster(detectCores())
registerDoParallel(workers)

# We need to modify the lib path in each worker too
parallel::clusterExport(workers, "pkg_path")
parallel::clusterEvalQ(workers, .libPaths(c(pkg_path, .libPaths())))

# ... Your code calling your package and doing stuff

parallel::stopCluster(workers)

Afterwards, from the command line (outside of R/RStudio), you could type (assuming Rscript is in your path):

Rscript path/to/wrapper.R /tmp/mypkg2

This way, the actual testing code can stay the same (including calls to library) and R will automatically search first in pkg_path, loading your specific package version, and then searching in the standard locations for any dependencies you may have.