Search code examples
conditional-statementssnakemake

conditional execution of snakemake rules based on column in metatable


I'm trying to use a column in a text file to conditionally execute rules in a snakemake workflow.

The text file is as follows:

id  end sample_name fq1 fq2
a   paired  test_paired resources/SRR1945436_1.fastq.gz resources/SRR1945436_2.fastq.gz
b   single  test_single resources/SRR1945436.fastq.gz   NA

For each sample in the text file, if value in end column is paired I would like to use rule cp_fastq_pe and if end is single then I would like to use rule cp_fastq_pe to process the fq1 & fq2 or just fq1 files, respectively.

relevant part of Snakefile is as follows:

import pandas as pd
samples = pd.read_table("config/samples.tsv").set_index("id", drop=False)
all_ids=list(samples["id"])

rule cp_fastq_pe:
    """
    copy file to resources
    """
    input:
        fq1=lambda wildcards: samples.loc[wildcards.id, "fq1"],
        fq2=lambda wildcards: samples.loc[wildcards.id, "fq2"]
    output:
        "resources/fq/{id}_1.fq.gz",
        "resources/fq/{id}_2.fq.gz"
    shell:
        """
        cp {input.fq1} {output[0]}
        cp {input.fq2} {output[1]}
        """

rule cp_fastq_se:
    """
    copy file to resources
    """
    input:
        fq1=lambda wildcards: samples.loc[wildcards.id, "fq1"]
    output:
        "resources/fq/{id}.fq.gz",
    shell:
        """
        cp {input.fq1} {output}
        """

Is it possible to do this?


Solution

  • I had a similar problem, which I solved here: How to make Snakemake input optional but not empty?

    Here is the idea adjusted to your problem. First, you need to specify the ruleorder to resolve the ambiguity (otherwise the single could always be applied whenever the paired is possible):

    ruleorder: cp_fastq_pe > cp_fastq_se
    

    Next, in your cp_fastq_pe rule you need to define a function that either returns a valid file (for the paired case) or returns a placeholder for non-existing file:

    rule cp_fastq_pe:
        input:
            fq1=lambda wildcards: samples.loc[wildcards.id, "fq1"],
            fq2=lambda wildcards: samples.loc[wildcards.id, "fq2"] if "fq2" in samples else "non-existing-filename"
    

    This rule would be applied to all samples wherever "fq2" field exists and represents a valid file. The other rule would be selected to the rest of the samples.