My problem is when using Snakemake scatter-gather feature the documentation is basic and i modified my code according to mentioned in this link:
rule fastq_fasta:
input:rules.trimmomatic.output.out_file
output:"data/trimmed/{sample}.fasta"
shell:"sed -n '1~4s/^@/>/p;2~4p' {input} > {output}"
rule split:
input:
"data/trimmed/{sample}.fasta"
params:
scatter_count=config["scatter_count"],
scatter_item = lambda wildcards: wildcards.scatteritem
output:
temp(scatter.split("data/trimmed/{{sample}}_{scatteritem}.fasta"))
script:
"scripts/split_files.py"
rule process:
input:"data/trimmed/{sample}_{scatteritem}.fasta"
output:"data/processed/{sample}_{scatteritem}.csv"
script:
"scripts/process.py"
rule gather:
input:
gather.split("data/processed/{{sample}}_{scatteritem}.csv")
output:
"data/processed/{sample}.csv"
shell:
"cat {input} > {output}"
I added wildcard option but, I got:
AmbiguousRuleException: Rules fastq_to_fasta(which is previous rule) and split are ambiguous for the file data/trimmed/Ornek_411-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-81-of-8.fasta
I tried lots of things but either rules are not calling or take AmbiguousRuleException
. What am i missing, can someone help?
There is ambiguity in terms of which rule should generate the specific file. An easy fix (if feasible) is to use a different path for scattered items:
rule split:
input:
"data/trimmed/{sample}.fasta"
params:
scatter_count=config["scatter_count"],
scatter_item = lambda wildcards: wildcards.scatteritem
output:
temp(scatter.split("data/trimmed_scatter/{{sample}}_{scatteritem}.fasta"))
script:
"scripts/split_files.py"
rule process:
input:"data/trimmed_scatter/{sample}_{scatteritem}.fasta"
output:"data/processed/{sample}_{scatteritem}.csv"
script:
"scripts/process.py"