Interconnected variables in snakemake

Let's say I have sample SAMPLE_A, divided into two files SAMPLE_A_1, SAMPLE_A_2 and associated to barcodes AATT, TTAA, and SAMPLE_Bassociated to barcodes CCGG, GGCC, GCGC, divided in 4 files SAMPLE_B_1...SAMPLE_B_4.

I can create getSampleNames() to get [SAMPLE_A,SAMPLE_A,SAMPLE_B,SAMPLE_B,SAMPLE_B,SAMPLE_B] and [1,2,1,2,3,4] and then zip them to get the combination {sample}_{id}. And then I can do the same thing for the barcodes: [SAMPLE_A,SAMPLE_A,SAMPLE_B,SAMPLE_B,SAMPLE_B] and [AATT, TTAA,CCGG, GGCC, GCGC].

SAMPLES_ID,IDs = getSampleNames()
SAMPLES_BC,BCs = getBCs(set(SAMPLES_ID))

rule refine:
input:
    '{sample}/demultiplex/{sample}_{id}.demultiplex.bam'
output:
    bam = '{sample}/polyA_trimming/{sample}_{id}.fltnc.bam',
shell:
    "isoseq3 refine {input} "


rule split:
input:
    expand('{sample}/polyA_trimming/{sample}_{id}.fltnc.bam', zip, sample = SAMPLES_ID, id = IDs),
output:
    expand("{sample}/cells/{barcode}_{sample}/fltnc.bam", zip, sample = SAMPLES_BC, barcode = BCs),
shell:
    "python {params.script_dir}/split_cells_bam.py"


rule dedup_split:
input:
    "{sample}/cells/{barcode}_{sample}/fltnc.bam"
output:
    bam = "{sample}/cells/{barcode}_{sample}/dedup/dedup.bam",
shell:
    "isoseq3 dedup {input} {output.bam} "

rule merge:
input:
    expand("{sample}/cells/{barcode}_{sample}/dedup/dedup.bam",
        zip, sample = SAMPLES_BC, barcode = BCs),

How can I prevent the rule split to be a bottleneck in my pipeline? For now it waits for the refine rule to be done for all samples while it's not necessary, every sample should run independently, but I can't because the set of barcodes is different for each sample. Is there a way to have something like

expand("{sample}/cells/{barcode}_{sample}/fltnc.bam", zip, sample = SAMPLES_BC, barcode = BCs[SAMPLES_BC]), where each {sample} of SAMPLES_BC is a key in BCs dictionary ? And same for IDs? I know I can use functions, but then I'm not sure how to propagate the {barcode} through the rules

Solution

I found how to use dictionaries through functions, which solved my problem!

The major default of this solution is that you have to create a dummy file as output of the split rule, instead of checking if each '{sample}/cells/{barcode}_{sample}/fltnc.bam' file is created, so I am still looking for something more elegant...

IDs = getSampleNames() #{SAMPLE_A:[1,2], SAMPLE_B:[1,2,3,4]}
SAMPLES = list(IDs.keys()) 
BCs = getBCs(SAMPLES) #{SAMPLE_A:[AATT, TTAA], SAMPLE_B:[CCGG,GGCC,GCGC]}
    
# function linking IDs and SAMPLE
def sample2ids(wildcards):
    return expand('{{sample}}/polyA_trimming/{{sample}}_{id}.fltnc.bam', 
               id = IDs[wildcards.sample])

# function linking BCs and SAMPLE
def sample2ids(wildcards):
    return expand('{{sample}}/cells/{barcode}_{{sample}}/dedup/dedup.bam',
               barcode = BCs[wildcards.sample])

rule refine:
input:
    '{sample}/demultiplex/{sample}_{id}.demultiplex.bam'
output:
    bam = '{sample}/polyA_trimming/{sample}_{id}.fltnc.bam',

rule split:
input:
    sample2ids
output:
    # cannot use a function here, so I create a dummy file to pipe
    'dummy_file.txt'

rule dedup_split:
input:
    'dummy_file.txt'
output:
    bam = "{sample}/cells/{barcode}_{sample}/dedup/dedup.bam",


rule merge:
input:
    sample2bc