Search code examples
snakemake

Merging two rules to one rule with wildcards/dictionary


I am mapping and counting sequences that are mixed from two species. I have two rules in my pipeline because each specie has it's own GTF file. I would like to be able to merge this into one rule, and then have the specie wildcard dictate which GTF file to use (using a dictionary for example). These are the rules:

rule count_reads_human:
    input:
        ibam = "sorted_reads/human/{spc}_sorted.bam",
        gtf = "/nadata/users/username/annotations/hg38/gencode.v38.annotation.gtf"
    output:
        bam = "counted_reads/human/{spc}_countBAM.bam",
        htout = "counted_reads/human/{spc}_htseq_count.out"
    log:
        "logs/count_reads/human/{spc}.log"
    shell:
        "htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"

rule count_reads_mouse:
    input:
        ibam = "sorted_reads/mouse/{spc}_sorted.bam",
        gtf = "/nadata/users/username/annotations/mm10/gencode.vM20.annotation.gtf"
    output:
        bam = "counted_reads/mouse/{spc}_countBAM.bam",
        htout = "counted_reads/mouse/{spc}_htseq_count.out"
    log:
        "logs/count_reads/mouse/{spc}.log"
    shell:
        "htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"

What I would like to do is something of the sort:

gtfDict = {"human": "path/to/human/gtf/file.gtf", "mouse": "/path/to/mouse/gtf/file.gtf"}

rule count_reads:
    input:
        ibam = "sorted_reads/{specie}/{spc}_sorted.bam",
        gtf = gtfDict[{specie}]
    output:
        bam = "counted_reads/{specie}/{spc}_countBAM.bam",
        htout = "counted_reads/{specie}/{spc}_htseq_count.out"
    log:
        "logs/count_reads/{specie}/{spc}.log"
    shell:
        "htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"

My merged rule however does not work.


Solution

  • Figured it out. Need to use lambda:

    gtfDict = {"human": "/path/to/human.gtf",
               "mouse": "/path/to/mouse.gtf"}
    
    rule count_reads:
        input:
            ibam = "sorted_reads/{specie}/{spc}_sorted.bam",
            gtf = lambda wildcards: gtfDict[wildcards.specie]
        output:
            bam = "counted_reads/{specie}/{spc}_countBAM.bam",
            htout = "counted_reads/{specie}/{spc}_htseq_count.out"
        log:
            "logs/count_reads/{specie}/{spc}.log"
        shell:
            "htseq-count -o {output.bam} -p bam {input.ibam} {input.gtf} > {output.htout} 2> {log}"