Search code examples
inputselectionexpandsnakemake

Snakemake multiple input files with expand but no repetitions


I'm new to snakemake and I don't know how to figure out this problem.

I've got my rule which has two inputs:

rule test
    input_file1=f1
    input_file2=f2

f1 is in [A{1}$, A{2}£, B{1}€, B{2}¥]

f2 is in [C{1}, C{2}]

The numbers are wildcards that come from an expand call. I need to find a way to pass to the file f1 and f2 a pair of files that match exactly with the number. For example:

f1 = A1

f2 = C1

or

f1 = B1

f2 = C1

I have to avoid combinations such as:

f1 = A1

f2 = C2

I would create a function that makes this kind of matches between the files, but the same should manage the input_file1 and the input_file2 at the same time. I thought to make a function that creates a dictionary with the different allowed combinations but how would I "iterate" over it during the expand?

Thanks


Solution

  • Assuming rule test gives you in output a file named {f1}.{f2}.txt, then you need some mechanism that correctly pairs f1 and f2 and create a list of {f1}.{f2}.txt files.

    How you create this list is up to you, expand is just a convenience function for that but maybe in this case you may want to avoid it.

    Here's a super simple example:

    fin1 = ['A1$', 'A2£', 'B1€', 'B2¥']
    fin2 = ['C1', 'C2']
    
    outfiles = []
    for x in fin1:
        for y in fin2:
            ## Here you pair f1 and f2. This is a very trivial way of doing it:
            if y[1] in x:
                outfiles.append('%s.%s.txt' % (x, y))
    
    wildcard_constraints:
        f1 = '|'.join([re.escape(x) for x in fin1]),
        f2 = '|'.join([re.escape(x) for x in fin2]),
    
    rule all:
        input:
            outfiles,        
    
    rule test: 
        input:
            input_f1 = '{f1}.txt',
            input_f2 = '{f2}.txt',
        output:
            '{f1}.{f2}.txt',
        shell:
            r"""
            cat {input} > {output}
            """
    

    This pipeline will execute the following commands

    cat A2£.txt C2.txt > A2£.C2.txt
    cat A1$.txt C1.txt > A1$.C1.txt
    cat B1€.txt C1.txt > B1€.C1.txt
    cat B2¥.txt C2.txt > B2¥.C2.txt
    

    If you touch the starting input files with touch 'A1$.txt' 'A2£.txt' 'B1€.txt' 'B2¥.txt' 'C1.txt' 'C2.txt' you should be able to run this example.