Search code examples
inputrulesexpandsnakemake

How to write a Snakemake rule-all, where expand statements can handle the absence of all particular input files


I want to write a Snakemake-Pipeline to process either short or long read sequencing files or both types, depending on which type of files is provided in the input file. First my Snakefile calls a shell script that creates a config file with the name of all short read files in the input directory under the heading short_reads and all long read files under the heading long_reads. This is followed by my all rule:

rule all:
input:
expand("../qc/id/{sample}/fastqc_raw/{sample}_R1_fastqc.html", sample=config["samples_short"]),
expand("../qc/id/{sample}/nanoplot_raw/NanoPlot-report.html", sample=config["samples_long"])
...

However, if one of the file types (long or short reads) is not provided, Snakemake fails with a KeyError. If I modify the config file in a way that the heading is still there but no sample names, Snakemake tries to call the input with the value None, e.g.

Missing input files for rule nanoplot_raw: ../raw_reads/None_ont.fastq.gz

How can I design the rule-all in a way, that it can handle either only short or long reads as well as both sequence types as Input?

Thanks for your help!


Solution

  • Does the following work?

    if config["samples_short"]:
        fastqc_short = expand("../qc/id/{sample}/fastqc_raw/{sample}_R1_fastqc.html", sample=config["samples_short"])
    else:
        fastqc_short = []
    
    if config["samples_long"]:
        nanoplot_long = expand("../qc/id/{sample}/nanoplot_raw/NanoPlot-report.html", sample=config["samples_long"])
    else:
        nanoplot_long = []
    
    rule all:
        input:
            fastqc_short,
            nanoplot_long,
            ...