Search code examples
pythonlist-comprehensionnested-listssnakemake

How to access each element of a nested list for naming output in snakemake?


This is a similar problem: Snakemake: using a python nested list comprehension for conditional analyses

I have the following:

RUN_ID = ["run1", "run2"]

SAMPLES = [["A", "B", "C"], ["D","E","F"]]
rule all:
    input:
        summary = expand("foo/{run}/{sample}/outs/{sample}_report.html", run=RUN_ID, sample=SAMPLES)

Problem 1: each run in RUN_ID should only associate with the corresponding sample in SAMPLES (based on index). So run1 only pairs with A,B,C and run2 only pairs with D,E,F.

Problem 2: the naming of each output file should reflect this index based pairing. Currently, I am struggling to get each element of each nested list in SAMPLES to pair with each RUN_ID

Based on the above I want the following output:

"foo/run1/A/outs/A_report.html"
"foo/run1/B/outs/B_report.html"
"foo/run1/C/outs/C_report.html"

"foo/run2/D/outs/D_report.html"
"foo/run2/E/outs/E_report.html"
"foo/run2/F/outs/F_report.html"

Initially I was getting this:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run1/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

"foo/run2/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"
"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

I overcame the undesired pairing using zip in expand function:

summary= expand(["foo/{run}/{sample}/outs/{sample}_report.html", "foo/{run}/{sample}/outs/{sample}_report.html"], zip, run=RUN_ID, sample=SAMPLES)

Leaving me with the desired pairing between RUN_ID and SAMPLES:

"foo/run1/["A", "B", "C"]/outs/["A", "B", "C"]_report.html"

"foo/run2/["D", "E", "F"]/outs/["D", "E", "F"]_report.html"

However, as is seen above, each nested list is being passed into the output path rather than each element of each nested list. I can achieve what I want by just separating the SAMPLES into two different lists but would like a more elegant and automated approach.

I am not stuck on using nested lists either; appreciate any insight into a fix or better approach. Thanks!


Solution

  • expand is a convenience utility, for more complex cases it's typically faster to generate the desired list directly using python:

    RUN_ID = ["run1", "run2"]
    SAMPLES = [["A", "B", "C"], ["D","E","F"]]
    
    desired_files = []
    for run, SAMPLE in zip(RUN_ID, SAMPLES):
       for sample in SAMPLE:
          file = f"foo/{run}/{sample}/outs/{sample}_report.html"
          desired_files.append(file)
        
    rule all:
       input: desired_files