Search code examples
pythonwildcardbioinformaticssnakemake

Snakemake scatter-gather with wildcard AmbiguousRuleException


My problem is when using Snakemake scatter-gather feature the documentation is basic and i modified my code according to mentioned in this link:

rule fastq_fasta:
    input:rules.trimmomatic.output.out_file
    output:"data/trimmed/{sample}.fasta"
    shell:"sed -n '1~4s/^@/>/p;2~4p' {input} > {output}"

rule split:
    input:
        "data/trimmed/{sample}.fasta"
    params:
        scatter_count=config["scatter_count"],
        scatter_item = lambda wildcards: wildcards.scatteritem
    output:
        temp(scatter.split("data/trimmed/{{sample}}_{scatteritem}.fasta"))
    script:
        "scripts/split_files.py"
        
rule process:
    input:"data/trimmed/{sample}_{scatteritem}.fasta"
    output:"data/processed/{sample}_{scatteritem}.csv"
    script:
        "scripts/process.py"

rule gather:
    input:
        gather.split("data/processed/{{sample}}_{scatteritem}.csv")
    output:
        "data/processed/{sample}.csv"
    shell:
        "cat {input} > {output}"

I added wildcard option but, I got:

AmbiguousRuleException: Rules fastq_to_fasta(which is previous rule) and split are ambiguous for the file data/trimmed/Ornek_411-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-8.fasta

I tried lots of things but either rules are not calling or take AmbiguousRuleException. What am i missing, can someone help?


Solution

  • There is ambiguity in terms of which rule should generate the specific file. An easy fix (if feasible) is to use a different path for scattered items:

    rule split:
        input:
            "data/trimmed/{sample}.fasta"
        params:
            scatter_count=config["scatter_count"],
            scatter_item = lambda wildcards: wildcards.scatteritem
        output:
            temp(scatter.split("data/trimmed_scatter/{{sample}}_{scatteritem}.fasta"))
        script:
            "scripts/split_files.py"
            
    rule process:
        input:"data/trimmed_scatter/{sample}_{scatteritem}.fasta"
        output:"data/processed/{sample}_{scatteritem}.csv"
        script:
            "scripts/process.py"