So my issue below is partially solved, however now I'm trying to pass a variable as input in rule all and resolve it to get dependent variables as inputs in another rule. My code:
rule all:
[f"outputs/STAR/all/{x}/counts_2.txt" for x in config["method"]]
rule feature_counts_per_sample:
input:
bam=[f"outputs/STAR/{name}/Aligned.sortedByCoord.out.sortedbyname.bam" for name in config["method"][{x}]],
gtf="data/chr19_20Mb.gtf"
output:
outA="outputs/STAR/all/{x}/counts_1.txt",
outB="outputs/STAR/all/{x}/counts_2.txt"
shell:
"mkdir -p outputs/STAR/all/{wildcards.x}/ && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.outA} -s 1 {input.bam} && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.outB} -s 2 {input.bam}"
The problem is with the input.bam - I get the name 'x' is not defined
error and cannot find a way to resolve it. Besides that, I know the code works because if I replace the {x}
with a constant value I get expected results. Is there a way to do this or should I be looking for a completely different approach?
I'm having trouble accessing nested values from my config.yaml file. My config.yaml:
method:
collibri:
- Collibri_standard_protocol-HBR-Collibri-100_ng-2_S1_L001_R
- Collibri_standard_protocol-HBR-Collibri-100_ng-3_S2_L001_R
- Collibri_standard_protocol-UHRR-Collibri-100_ng-2_S3_L001_R
- Collibri_standard_protocol-UHRR-Collibri-100_ng-3_S4_L001_R
kapa:
- KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-3_S8_L001_R
- KAPA_mRNA_HyperPrep_-HBR-KAPA-100_ng_total_RNA-2_S5_L001_R
- KAPA_mRNA_HyperPrep_-HBR-KAPA-100_ng_total_RNA-3_S6_L001_R
- KAPA_mRNA_HyperPrep_-UHRR-KAPA-100_ng_total_RNA-2_S7_L001_R
num:
- 1
- 2
type:
- collibri
- kapa
And my goal is to call all files from method groups as inputs at once and direct output to folder which would have the method name on it (e.g. run rule using all names under 'kapa' at once and place the output in 'kapa' folder). Shortened version of my Snakefile:
configfile: "config.yaml"
rule all:
input:
expand("outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam.bai", filename=config["method"]["collibri"]),
expand("outputs/STAR/{filename}/counts_2.txt", filename=config["method"]["collibri"]),
expand("outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam.bai", filename=config["method"]["kapa"]),
expand("outputs/STAR/{filename}/counts_2.txt", filename=config["method"]["kapa"]),
expand("outputs/STAR/{type}/counts_2.txt", type=config["type"])
rule bam_index:
input:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam"
output:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam.bai"
shell:
"samtools index {input}"
rule bam_sort_name:
input:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.bam"
output:
"outputs/STAR/{filename}/Aligned.sortedByCoord.out.sortedbyname.bam"
shell:
"samtools sort -n -o {output} {input}"
rule feature_counts:
input:
bam="outputs/STAR/{filename}/Aligned.sortedByCoord.out.sortedbyname.bam",
gtf="data/chr19_20Mb.gtf"
output:
out1="outputs/STAR/{filename}/counts_1.txt",
out2="outputs/STAR/{filename}/counts_2.txt"
shell:
"featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out1} -s 1 {input.bam} && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out2} -s 2 {input.bam}"
rule feature_counts_per_sample:
input:
bam=expand("outputs/STAR/{name}/Aligned.sortedByCoord.out.sortedbyname.bam", name=config["method"][{type}]),
gtf="data/chr19_20Mb.gtf"
output:
out1="outputs/STAR/{type}/counts_1.txt",
out2="outputs/STAR/{type}/counts_2.txt"
shell:
"mkdir -p outputs/STAR/{type}/ && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out1} -s 1 {input.bam} && featureCounts -p -t exon -g gene_id -a {input.gtf} -o {output.out2} -s 2 {input.bam}"
So overall there are two issues that I cannot solve:
This line is wrong:
bam=[f"outputs/STAR/{name}/Aligned.sortedByCoord.out.sortedbyname.bam" for name in config["method"][{x}]],
Snakemake will know specific value of x
only at the time of rule evaluation, so the command above will lead to an error. To postpone the evaluation you will need to use lambda wildcards
syntax:
bam=lambda wildcards: [f"outputs/STAR/{name}/Aligned.sortedByCoord.out.sortedbyname.bam" for name in config["method"][wildcards.x]],