Search code examples
snakemake

Using the expand() function in snakemake to perform a shell command multiple times


I would like to perform an R script multiple times on different input files with the help of snakemake. To do this I tried the use of the expand function.

I am relatively new to snakemake and when I understand it correctly, the expand function gives me for example multiple input files which are then all concatenated and available via {input}.

Is it possible to call the shell command on the files one by one?

Lets say I have this definition in my config.yaml:

types:
    - "A"
    - "B" 

This would be my example rule:

rule manual_groups:
    input:
        expand("chip_{type}.bed",type=config["types"])
    output:
        expand("data/p_chip_{type}.model",type=config["types"])
    shell:
        "Rscript scripts/pre_process.R {input}"

This would lead to the command:

Rscript scripts/pre_process.R chip_A.bed chip_B.bed

Is it possible to instead call the command two times independently with two types like this:

Rscript scripts/pre_process.R chip_A.bed
Rscript scripts/pre_process.R chip_B.bed

Thank you for any help in advance!


Solution

  • Define final target files in rule all, and then just use appropriate wildcard (i.e., type) in rule manual_groups. This would run rule manual_groups separately for each output file listed in rule all.

    rule all:
        input:
            expand("data/p_chip_{type}.model",type=config["types"])
    
    
    rule manual_groups:
        input:
            "chip_{type}.bed"
        output:
            "data/p_chip_{type}.model"
        shell:
            "Rscript scripts/pre_process.R {input}"
    

    PS- You may want to change wildcard term type because of potential conflict with Python's type method.