Search code examples
pythonsnakemake

Snakemake expand on a dictionary, keeping wildcards


I have a dictionary like the following:

data = {
    "group1": ["a", "b", "c"], 
    "group2": ["x", "y", "z"]
}

I want to use expand to get all combinations between the keys and their values separately in "rule all", s.t. the expected output files are e.g. "group1/a.txt", "group1/b.txt", ... "group2/x.txt, "group2/y.txt" ...

rule all: 
    input: 
        expand("{group}/{sub_group}.txt", group = ???, sub_group = ???)

I need this for the rule "some_rule":


rule some_rule: 
    input: "single_input_file.txt"
    output: "{group}/{sub_group}.txt"
    params: 
        group=group, # how do I extract these placeholders?
        sub_group=sub_group
    script: 
        "some_script.R"

The reason why I need to have group and sub_group wildcards is because I need to pass them to the params of rule "some_rule"

I tried to hardcode all output files needed in the "rule all" with list comprehension, but then the placeholders are not defined in the wildcards and I cannot pass them to the params.

So I guess I need to define the "rule all" input files using expand, but here I don't know how to get the correct files, as I need the combinations to be performed individually between "group1" and its values and "group2" and its values.

I also cannot use an input function for the rule "some_rule", as it has only one singular static input file.

In other similar questions on StackOverflow, either there is not the combinatorial problem, or they create the input files for "rule_all" using plain python, which makes me loose the wildcards.


Solution

  • I found a solution for my problem using a custom combinator function.

    def pairwise_product(*args):
    result = []
    for group, sub_group in zip(*args):
        sub_group = ([sub_group[0]], sub_group[1])
        for sub_sub_group in itertools.product(*sub_group):
            result.append((group, sub_sub_group))
    return result
    

    Looking at the source code for snakemake's expand function, I realized that I can use my own combinator function.

    pairwise_product expects as input two lists of tuples, where each tuple contains the wildcard name and the wildcard value, e.g.

    wildcard1 = [("group", "group1"), ("group", "group2")]
    wildcard2 = [("sub_group", ["a", "b", "c"]), ("sub_group", ["x", "y", "z"])]
    pairwise_product(wildcard1, wildcard2)
    

    The output of this function call would be:

    [(('group', 'group1'), ('sub_group', 'a')),
     (('group', 'group1'), ('sub_group', 'b')),
     (('group', 'group1'), ('sub_group', 'c')),
     (('group', 'group2'), ('sub_group', 'x')),
     (('group', 'group2'), ('sub_group', 'y')),
     (('group', 'group2'), ('sub_group', 'z'))]
    

    And the output of the expand function would be:

    expand("{group}/{sub_group}.txt", pairwise_product, group=data.keys(), sub_group=data.values())
    
    ['group1/a.txt',
     'group1/b.txt',
     'group1/c.txt',
     'group2/x.txt',
     'group2/y.txt',
     'group2/z.txt']
    

    With this solution I also get the wildcards I want, i.e. the individual elements in the list-values for each dictionary key separately.

    Note that this function has been designed for only two wildcards in the format as shown above in the data dictionary and not tested for other formats.