Search code examples
dockersnakemakesingularity-container

Snakemake + docker example, how to use volumes


Lets have a simple snakefile like

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    shell:
        "somecommand {input} {output}"

I want to generalize the plot rule so that it can be run inside a docker container, whit somethig like

rule targets:
    input:
        "plots/dataset1.pdf",
        "plots/dataset2.pdf"

rule plot:
    input:
        "raw/{dataset}.csv"
    output:
        "plots/{dataset}.pdf"
    singularity:
        "docker://joseespinosa/docker-r-ggplot2"
    shell:
        "somecommand {input} {output}"

If I understood well, when I run snakemake --use-singularity I obtain that somecommand run inside the docker container, where the input csv files cannot be found without some volume configuration of the container.

Can you please provide a small working example describing how volumes can be configured in the Snakefile or other Snakemake files?


Solution

  • When you run snakemake and tell it to use singularity images, you do this:

    snakemake --use-singularity

    You can also pass additional arguments to singularity, including bind points, like this:

    snakemake --use-singularity --singularity-args "-B /path/outside/container/:/path/inside/container/"

    Now, if your csv file is in /path/outside/container/, it can be seen by somecommand without issue.

    Bear in mind, if your inside and outside paths are not identical, you'll need to use both paths in your snakemake rule, in different sections. This is how I've done it:

    rule targets:
        input:
            "plots/dataset1.pdf",
            "plots/dataset2.pdf"
    
    rule plot:
        input:
            "raw/{dataset}.csv"
        output:
            "plots/{dataset}.pdf"
        params:
            i = "inside/container/input/{dataset}.csv",
            o = "inside/container/output/{dataset}.pdf"
        singularity:
            "docker://joseespinosa/docker-r-ggplot2"
        shell:
            "somecommand {params.i} {params.o}"
    

    When you run this snakefile, bind raw/ to inside/container/input/, and bind plots/ to inside/container/output/. Snakemake will look for the input/output files on your local machine, but will give the container the command to run with the inside-container paths, and everything will be awesome.

    TL;DR: Local paths in input and output, container paths in params and shell. Bind local and container paths in the command line invocation.