Search code examples
workflowsnakemake

Execute certain rule at the very end


I am currently writing a Snakefile, which does a lot of post-alignment quality control (CollectInsertSizeMetics, CollectAlignmentSummaryMetrics, CollectGcBiasMetrics, ...). At the very end of the Snakefile, I am running multiQC to combine all the metrics in one html report.

I know that if I use the output of rule A as input of rule B, rule B will only be executed after rule A is finished. The problem in my case is that the input of multiQC is a directory, which exists right from the start. Inside of this directory, multiQC will search for certain files and then create the report. If I am currently executing my Snakemake file, multiQC will be executed before all quality controls will be performed (e.g. fastqc takes quite some time), thus these are missing in the final report.

So my question is, if there is an option, that specifies that a certain rule is executed last. I know that I could use --wait-for-files to wait for a certain fastqc report, but that seems very inflexible.

The last rule currently looks like this:

rule multiQC:
    input:
        input_dir = "post-alignment-qc"

    output:
        output_html="post-alignment-qc/multiQC/mutliqc-report.html"

    log:
        err='post-alignment-qc/logs/fastQC/multiqc_stderr.err'

    benchmark:
        "post-alignment-qc/benchmark/multiQC/multiqc.tsv"

    shell:
         "multiqc -f -n {output.output_html} {input.input_dir} 2> {log.err}"

Any help is appreciated!


Solution

  • You could give to the input of multiqc rule the files produced by the individual QC rules. In this way, multiqc will start once all those files are available:

    samples = ['a', 'b', 'c']
        
    rule collectInsertSizeMetrics:
            input:
                '{sample}.bam',
            output:
                'post-alignment-qc/{sample}.insertSizeMetrics.txt' # <- Some file produced by CollectInsertSizeMetrics
            shell: 
                "CollectInsertSizeMetics {input} > {output}"
        
        rule CollectAlignmentSummaryMetrics:
            output: 
                'post-alignment-qc/{sample}.CollectAlignmentSummaryMetrics.txt'
    
        rule multiqc:
            input:
                expand('post-alignment-qc/{sample}.insertSizeMetrics.txt', sample=samples),
                expand('post-alignment-qc/{sample}.CollectAlignmentSummaryMetrics.txt', sample=samples),
            shell:
                "multiqc -f -n {output.output_html} post-alignment-qc 2> {log.err}"