Search code examples
dockerbuilddockerfiler-packagemultistage

How do I 'copy' installed R-packages from the 1ste stage to 2nd stage using multistage building on a R-base image?


I'm trying to build an image base on R-base, following the multi stage method. How can I copy the installed packages from the 1ste stage into the 2nd stage? And nothing else?

The current file gives me basically a 'packageless' R-base version. So the packages installed in the 1ste stage are 'lost' somewhere.

I think it has something to do with making and choosing the correct directories. This is a confusing part for me, since I'm fairly new to dockerizing applications.

Thanks for all your help!

Below my current file:

# Base image
FROM rocker/r-base:latest AS stage1

## install binary, build and dependend packages
RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
r-cran-pdftools \
r-cran-dplyr \
r-cran-stringr \
libxml2-dev \
libssl-dev && \
echo "r <- getOption('repos');r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile && \
Rscript -e "install.packages(c('AzureStor'))"

##2nd stage, pulling 'fresh' base image
FROM rocker/r-base:latest

#COPY packages from 1st stage
COPY --from=stage1 /usr/local/lib/R/site-library /usr/local/lib/R/site-library

## create directories
RUN mkdir -p /script \

#Copy scripts
COPY /script /script

## Set workdir
WORKDIR /script

Solution

  • In the comments you note that you want to get rid of any excess 'weight'. The latter typically comes from having development tools and packages installed. Now the rocker/r-base image brings in quite a bit of weight already, since it has r-base-devel with its dependencies installed. However, we can try to not add further weight by having only the run-time dependencies in the final image by getting rid of the build-time dependencies. Build-time dependencies that are not necessary at run-time for an R package are typically development files like header files for system libraries, e.g. you don't need the libxml2-dev package at run-time. The libxml2 package would be enough. I see several possible approaches to this.

    First, you could use binary packages for those packages that need compilation against system libraries. I have not checked the dependencies for AzureStor, but it might well be that all the required R packages exist as compiled Debian packages. These will only depend on the run-time dependencies keeping the images size small and the build time short. Your Dockerfile would look something like this:

    FROM rocker/r-base:latest
    
    ## install binary, build and dependend packages
    RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
        r-cran-pdftools \
        r-cran-dplyr \
        r-cran-stringr \
        r-cran-... \
        r-cran-... && \
        Rscript -e "install.packages(c('AzureStor'))" && \
        apt-get clean %% \
        rm -rf /var/lib/apt/lists/* && \
        rm -rf /tmp/*
    
    ## create directories
    RUN mkdir -p /script 
    
    #Copy scripts
    COPY /script /script
    
    ## Set workdir
    WORKDIR /script
    

    Second, you could install both build- and run-time dependencies before installing R packages from source and remove the build-time dependencies after it, all within one command:

    FROM rocker/r-base:latest
    
    ## install binary, build and dependend packages
    RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
        r-cran-pdftools \
        r-cran-dplyr \
        r-cran-stringr \
        libxml2-dev libxml2 \
        libssl-dev libssl1.1 && \
        Rscript -e "install.packages(c('AzureStor'))" && \
        apt-get purge --yes libxml2-dev libssl-dev && \
        apt-get clean %% \
        rm -rf /var/lib/apt/lists/* && \
        rm -rf /tmp/*
    
    
    ## create directories
    RUN mkdir -p /script 
    
    #Copy scripts
    COPY /script /script
    
    ## Set workdir
    WORKDIR /script
    

    Finally, you could use a multistage build with three stages:

    1. Add the run-time dependencies.
    2. Add the build-time dependencies and install packages into /usr/local/lib/R/site-library.
    3. Use 1. as base and add the packages from 2.

    So something like this:

    # Base image
    FROM rocker/r-base:latest AS stage1
    
    ## install binary, build and dependend packages
    RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
    r-cran-pdftools \
    r-cran-dplyr \
    r-cran-stringr \
    libxml2 \
    libssl1.1 && \
    apt-get clean %% \
    rm -rf /var/lib/apt/lists/* && \
    rm -rf /tmp/*
    
    FROM stage1 AS stage2
    RUN apt-get update && apt-get install -y -qq --no-install-recommends --purge \
    libxml2-dev \
    libssl-dev && \
    Rscript -e "install.packages(c('AzureStor'))"
    
    
    FROM stage1
    
    COPY --from=stage2 /usr/local/lib/R/site-library /usr/local/lib/R/site-library
    
    ## create directories
    RUN mkdir -p /script \
    
    #Copy scripts
    COPY /script /script
    
    ## Set workdir
    WORKDIR /script
    

    I have personally used the first and second approach. I have not tested the third approach, by I expect it to work as well.