This is a similar problem: Snakemake: using a python nested list comprehension for conditional analyses
I have the following:
RUN_ID = ["run1", "run2"]
SAMPLES = [["A", "B", "C"], ["D","E","F"]]
rule all:
input:
summary = expand("foo/{run}/{sample}/outs/{sample}_report.html", run=RUN_ID, sample=SAMPLES)
Problem 1: each run in RUN_ID
should only associate with the corresponding sample in SAMPLES
(based on index). So run1 only pairs with A,B,C and run2 only pairs with D,E,F.
Problem 2: the naming of each output file should reflect this index based pairing. Currently, I am struggling to get each element of each nested list in SAMPLES
to pair with each RUN_ID
Based on the above I want the following output:
"foo/run1/A/outs/A_report.html"
"foo/run1/B/outs/B_report.html"
"foo/run1/C/outs/C_report.html"
"foo/run2/D/outs/D_report.html"
"foo/run2/E/outs/E_report.html"
"foo/run2/F/outs/F_report.html"
Initially I was getting this:
"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run1/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"
"foo/run2/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"
I overcame the undesired pairing using zip
in expand
function:
summary= expand(["foo/{run}/{sample}/outs/{sample}_report.html", "foo/{run}/{sample}/outs/{sample}_report.html"], zip, run=RUN_ID, sample=SAMPLES)
Leaving me with the desired pairing between RUN_ID
and SAMPLES
:
"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"
However, as is seen above, each nested list is being passed into the output path rather than each element of each nested list. I can achieve what I want by just separating the SAMPLES into two different lists but would like a more elegant and automated approach.
I am not stuck on using nested lists either; appreciate any insight into a fix or better approach. Thanks!
expand
is a convenience utility, for more complex cases it's typically faster to generate the desired list directly using python:
RUN_ID = ["run1", "run2"]
SAMPLES = [["A", "B", "C"], ["D","E","F"]]
desired_files = []
for run, SAMPLE in zip(RUN_ID, SAMPLES):
for sample in SAMPLE:
file = f"foo/{run}/{sample}/outs/{sample}_report.html"
desired_files.append(file)
rule all:
input: desired_files