python list-comprehension nested-lists snakemake

How to access each element of a nested list for naming output in snakemake?

I have the following:

RUN_ID = ["run1", "run2"]

SAMPLES = [["A", "B", "C"], ["D","E","F"]]

rule all:
    input:
        summary = expand("foo/{run}/{sample}/outs/{sample}_report.html", run=RUN_ID, sample=SAMPLES)

Problem 1: each run in RUN_ID should only associate with the corresponding sample in SAMPLES (based on index). So run1 only pairs with A,B,C and run2 only pairs with D,E,F.

Problem 2: the naming of each output file should reflect this index based pairing. Currently, I am struggling to get each element of each nested list in SAMPLES to pair with each RUN_ID

Based on the above I want the following output:

"foo/run1/A/outs/A_report.html"
"foo/run1/B/outs/B_report.html"
"foo/run1/C/outs/C_report.html"

"foo/run2/D/outs/D_report.html"
"foo/run2/E/outs/E_report.html"
"foo/run2/F/outs/F_report.html"

Initially I was getting this:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run1/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

"foo/run2/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

I overcame the undesired pairing using zip in expand function:

summary= expand(["foo/{run}/{sample}/outs/{sample}_report.html", "foo/{run}/{sample}/outs/{sample}_report.html"], zip, run=RUN_ID, sample=SAMPLES)

Leaving me with the desired pairing between RUN_ID and SAMPLES:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"

"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

However, as is seen above, each nested list is being passed into the output path rather than each element of each nested list. I can achieve what I want by just separating the SAMPLES into two different lists but would like a more elegant and automated approach.

I am not stuck on using nested lists either; appreciate any insight into a fix or better approach. Thanks!

Solution

expand is a convenience utility, for more complex cases it's typically faster to generate the desired list directly using python:

RUN_ID = ["run1", "run2"]
SAMPLES = [["A", "B", "C"], ["D","E","F"]]

desired_files = []
for run, SAMPLE in zip(RUN_ID, SAMPLES):
   for sample in SAMPLE:
      file = f"foo/{run}/{sample}/outs/{sample}_report.html"
      desired_files.append(file)
    
rule all:
   input: desired_files