I have a dictionary like the following:
data = {
"group1": ["a", "b", "c"],
"group2": ["x", "y", "z"]
}
I want to use expand to get all combinations between the keys and their values separately in "rule all", s.t. the expected output files are e.g. "group1/a.txt", "group1/b.txt", ... "group2/x.txt, "group2/y.txt" ...
rule all:
input:
expand("{group}/{sub_group}.txt", group = ???, sub_group = ???)
I need this for the rule "some_rule":
rule some_rule:
input: "single_input_file.txt"
output: "{group}/{sub_group}.txt"
params:
group=group, # how do I extract these placeholders?
sub_group=sub_group
script:
"some_script.R"
The reason why I need to have group
and sub_group
wildcards is because I need to pass them to the params
of rule "some_rule"
I tried to hardcode all output files needed in the "rule all" with list comprehension, but then the placeholders are not defined in the wildcards and I cannot pass them to the params.
So I guess I need to define the "rule all" input files using expand
, but here I don't know how to get the correct files, as I need the combinations to be performed individually between "group1" and its values and "group2" and its values.
I also cannot use an input function for the rule "some_rule", as it has only one singular static input file.
In other similar questions on StackOverflow, either there is not the combinatorial problem, or they create the input files for "rule_all" using plain python, which makes me loose the wildcards.
I found a solution for my problem using a custom combinator function.
def pairwise_product(*args):
result = []
for group, sub_group in zip(*args):
sub_group = ([sub_group[0]], sub_group[1])
for sub_sub_group in itertools.product(*sub_group):
result.append((group, sub_sub_group))
return result
Looking at the source code for snakemake's expand function, I realized that I can use my own combinator function.
pairwise_product
expects as input two lists of tuples, where each tuple contains the wildcard name and the wildcard value, e.g.
wildcard1 = [("group", "group1"), ("group", "group2")]
wildcard2 = [("sub_group", ["a", "b", "c"]), ("sub_group", ["x", "y", "z"])]
pairwise_product(wildcard1, wildcard2)
The output of this function call would be:
[(('group', 'group1'), ('sub_group', 'a')),
(('group', 'group1'), ('sub_group', 'b')),
(('group', 'group1'), ('sub_group', 'c')),
(('group', 'group2'), ('sub_group', 'x')),
(('group', 'group2'), ('sub_group', 'y')),
(('group', 'group2'), ('sub_group', 'z'))]
And the output of the expand function would be:
expand("{group}/{sub_group}.txt", pairwise_product, group=data.keys(), sub_group=data.values())
['group1/a.txt',
'group1/b.txt',
'group1/c.txt',
'group2/x.txt',
'group2/y.txt',
'group2/z.txt']
With this solution I also get the wildcards I want, i.e. the individual elements in the list-values for each dictionary key separately.
Note that this function has been designed for only two wildcards in the format as shown above in the data
dictionary and not tested for other formats.