Search code examples
rlinuxdockerubuntu

How strongly can system software interfere with R package functionality?


Background:

I am trying to establish a setup where I can create a persistent build for a docker image, which will in turn power the core of an R Statistics process. At this point I've figured out how to install exactly the R Packages that I am requesting, however, I do wonder how relevant the software supplied by the underlying system (in my case Ubuntu 20.04) is in regard to reproducibility in R. I am installing via apt-get install but without version specification there.

Questions:

  1. What can happen, in regard to R Package functionality, when I rebuild the image later with all the same R packages specified but potentially different system libraries?
  2. How big can the influence be and what are the remedies?

Any guidance is appreciated.


Solution

  • This question is quite broad/vague, but you should probably worry first about

    • linear algebra libraries (BLAS/LAPACK)
    • compiler versions

    Beyond that, it will depend on whether the packages you are loading use additional system libraries (see the SystemRequirements: field in the DESCRIPTION file of the package, or on the CRAN web page). For example, sf (a package for spatial data processing) lists

    C++11, GDAL (>= 2.0.1), GEOS (>= 3.4.0), PROJ (>= 4.8.0), sqlite3
    

    Speaking only for the first two (compiler/lin alg), the differences will be at the floating-point precision level. To the extent that the numerical methods used are robust/statistical problems you're working with are stable and well-posed, the differences will only be at the level that you can mitigate by using standard best practices for floating-point comparison (e.g., using all.equal() rather than == or identical()).