I want to run a Snakemake workflow where the input is defined by a combination of different variables (e.g. pairs of samples, sample ID and Nanopore barcode,...):
sample_1 = ["foo", "bar", "baz"]
sample_2 = ["spam", "ham", "eggs"]
I've got a rule using these:
rule frobnicate:
input:
assembly = "{first_sample}_{second_sample}.txt"
output:
frobnicated = "{first_sample}_{second_sample}.frob"
I now want to create a rule all
that will do this for some combinations of the samples in sample_1
and sample_2
, but not all of them.
Using expand
would give me all possible combinations of sample_1
and sample_2
.
How can I, for example, just combine the first variable in the first list with the first in the second and so on (foo_spam.frob
, bar_ham.frob
, and baz_eggs.frob
)?
And what if I want some more complex combination?
expand
with other combinatoric functionsBy default, expand
uses the itertools function product
. However, it's possible to specify another function for expand
to use.
To combine the first variable in the first with the first in the second and so on, one can tell expand
to use zip
:
sample_1 = ["foo", "bar", "baz"]
sample_2 = ["spam", "ham", "eggs"]
rule all:
input: expand("{first_sample}_{second_sample}.frob", zip, first_sample=sample_1, second_sample=sample_2)
will yield foo_spam.frob
, bar_ham.frob
, and baz_eggs.frob
as inputs to rule all
.
The input generated by expand
is ultimately just a list of file names. If you can't get where you want to with expand
and another combinatoric function, it could be easier to just use regular Python code to generate the list yourself (for an example of this in action, see this question).
If your combination of inputs can't be arrived at programmatically at all, one last resort would be to write out the combinations you want by hand. For example:
sample_1 = ["foo", "bar", "baz"]
sample_2 = ["spam", "ham", "eggs"]
all_frobnicated = ["foo_eggs.frob", "bar_spam.frob", "baz_ham.frob"]
rule all:
input: all_frobnicated
This will, of course, mean your inputs are completely hardcoded, so if you want to use this workflow with a new batch, you'll have to write the sample combinations you want there out by hand as well.