Search code examples
pythonbioinformaticssnakemake

Is it possible to have an optional output file in Snakemake?


I'm writing a snakemake rule that will handle performing fastq trimming on either single or paired end sequencing data. If the data is paired end there should be two output files, if single end, there should be one.

The rule I have written works so far, however, I don't have it that the second pair of the trimmed is an output file. This means that snakemake is not checking to see if that file exists. It will output, but it's not checking, is it possible to have an optional output?

    input:
    #get the value in the fast1 column
        fastq_file = lambda wildcards: return_fastq(wildcards.fastq_name,wildcards.unit,first_pair = True)
    output:
        out_fastqc = config["fastp_trimmed_output_folder"] + "{unit}/{fastq_name}_trimmed.fastq.gz",
        fastpjson = config["fastp_trimmed_output_folder"] + "{unit}/{fastq_name}_fastp.json",
        fastphtml = config["fastp_trimmed_output_folder"] + "{unit}/{fastq_name}_fastp.html"
    params:
        fastp_parameters = return_parsed_extra_params(config['fastp_parameters']),
        fastq_file2 = lambda wildcards: return_fastq(wildcards.fastq_name,wildcards.unit,first_pair = False),
        out_fastqc2 = lambda wildcards: return_fastq2_name(wildcards.fastq_name,wildcards.unit),
        fastpjson = config["fastp_trimmed_output_folder"] + "{unit}/{fastq_name}_fastp.json",
        fastphtml = config["fastp_trimmed_output_folder"] + "{unit}/{fastq_name}_fastp.html"
    run:
        if config["end_type"] == "se":
            shell("{config[fastp_path]} -i {input.fastq_file} -o {output.out_fastqc} --json {output.fastpjson} --html {output.fastphtml} {params.fastp_parameters}")
        if config["end_type"] == "pe":
            shell("{config[fastp_path]} --in1 {input.fastq_file} --in2 {params.fastq_file2} --out1 {output.out_fastqc} --out2  {params.out_fastqc2} --json {output.fastpjson} --html {output.fastphtml} {params.fastp_parameters}")

The goal is that the out_fastqc2 would be includes as an optional output of the rule so that snakemake will check if it exists and appropiately give me an error if it doesn't.

If Snakemake doesn't allow optional outputs, I could just split into two rules, but that's not quite what I'd like.


Solution

  • Look at how does the expand function work. It is being called at the phase when Snakemake constructs the DAG of dependencies, and it uses the result of this function to construct the list of files for the output section.

    I would suggest you to try the same: construct the list that would be either empty or not - depends on the condition.

    This solution would work only if you know if you need the out_fastqc2 in advance (however defining 2 rules with priorities does the same). If you get the information about the need of out_fastqc2 only while running the rule, that is completely different case where you need checkpoints.

    Below is the code that illustrates my approach: out_fastqc2 becomes a string that describes the file (if the end_type is configured to "pe"), otherwise is becomes an empty list that doesn't change the list of outputs.

    output:
        out_fastqc = config["fastp_trimmed_output_folder"] + "{unit}/{fastq_name}_trimmed.fastq.gz",
        out_fastqc2 = lambda wildcards: return_fastq2_name(wildcards.fastq_name,wildcards.unit) if config["end_type"] == "pe" else []