Search code examples
snakemakebam

Running multiple snakemake rules


I would like to run multiple rules one after another using snakemake. However, when I run this script, the bam_list rule appears before samtools_markdup rule, and gives me an error that it cannot find input files, which are obviously have not been generated yet. How to solve this problem?

rule all:
    input: 
        expand("dup/{sample}.dup.bam", sample=SAMPLES)
        "dup/bam_list"

rule samtools_markdup:
    input:
        sortbam ="rg/{sample}.rg.bam"
    output:
        dupbam = "dup/{sample}.dup.bam"
    threads: 5
    shell:
        """
        samtools markdup -@ {threads} {input.sortbam} {output.dupbam}
        """

rule bam_list:
    output:
         outlist = "dup/bam_list"
    shell:
         """
         ls dup/*.bam > {output.outlist}
         """

Solution

  • Snakemake is following directions, you want dup/bam_list and it can be produced without any inputs. I think what you mean to have is:

    rule all:
        input: 
            "dup/bam_list"
    
    rule samtools_markdup:
        input:
            sortbam ="rg/{sample}.rg.bam"
        output:
            dupbam = "dup/{sample}.dup.bam"
        threads: 5
        shell:
            """
            samtools markdup -@ {threads} {input.sortbam} {output.dupbam}
            """
    
    rule bam_list:
        input: 
            expand("dup/{sample}.dup.bam", sample=SAMPLES)
        output:
            outlist = "dup/bam_list"
        shell:
             """
             ls dup/*.bam > {output.outlist}
             """
    

    Now bam_list will wait until all the samtools_markdup jobs are completed. As an aside, I expect the contents of dup_list to be identical to expand("dup/{sample}.dup.bam", sample=SAMPLES), so if you use the file later in the workflow you can probably just use the expand output.