Search code examples
rcondasnakemakemamba

Using R in a Snakemake workflow with Mambaforge


I'm building a pipeline with Snakemake. One rule involves an R script that reads a CSV file using readr. I get this error when I run the pipeline with --use-singularity and --use-conda

Error: Unknown TZ UTC
In addition: Warning message:
In OlsonNames() : no Olson database found
Execution halted

Google suggests readr is crashing due to missing tzdata but I can't figure out how to install the tzdata package and make readr see it. I am running the entire pipeline in a Mambaforge container to ensure reproducibility. Snakemake recommends using Mambaforge over a Miniconda container as it's faster, but I think my error involves Mambaforge as using Miniconda solves the error.

Here's a workflow to reproduce the error:

#Snakefile
singularity: "docker://condaforge/mambaforge"

rule targets:
    input:
        "out.txt"

rule readr:
    input:
        "input.csv"
    output:
        "out.txt"
    conda:
        "env.yml"
    script:
        "test.R"
#env.yml
name: env
channels:
    - default
    - bioconda
    - conda-forge
dependencies:
    - r-readr
    - tzdata
#test.R
library(readr)
fp <- snakemake@input[[1]]
df <- read_csv(fp)
print(df)
write(df$x, "out.txt")

I run the workflow with snakemake --use-conda --use-singularity. How do I run R scripts when the Snakemake workflow is running from a Mambaforge singularity container?


Solution

  • Looking through the stack of R code leading to the error, I see that it checks a bunch of default locations for the zoneinfo folder that tzdata includes, but also checks for a TZDIR environment variable.

    I believe a proper solution to this would be for the Conda tzdata package to set this variable to point to it. This will require a PR to the Conda Forge package (see repo issue). In the meantime, one could do either of the following as workarounds.

    Workaround 1: Set TZDIR from R

    Continuing to use the tzdata package from Conda, one could set the environment variable at the start of the R script.

    #!/usr/bin/env Rscript
    
    ## the following assumes active Conda environment with `tzdata` installed
    Sys.setenv("TZDIR"=paste0(Sys.getenv("CONDA_PREFIX"), "/share/zoneinfo"))
    

    I would consider this a temporary workaround.

    Workaround 2: Derive a New Docker

    Otherwise, make a new Docker image that includes a system-level tzdata installation. This appears to be a common issue, so following other examples (and keeping things clean), it'd go something like:

    Dockerfile

    FROM --platform=linux/amd64 condaforge/mambaforge:latest
    
    ## include tzdata
    RUN apt-get update > /dev/null \
      && DEBIAN_FRONTEND="noninteractive" apt-get install --no-install-recommends -y tzdata > /dev/null \
      && apt-get clean
    

    Upload this to Docker Hub and use it instead of the Mambaforge image as the image for Snakemake. This is probably a more reliable long-term solution, but perhaps not everyone wants to create a Docker Hub account.