Search code examples
python-3.xsnakemake

Multiple named inputs in Snakefile


I want to make a pipeline that looks like this:

  1. For each dataset extract some features
  2. Make a unique list of all features
  3. Extract the unique list from all the original datasets.

Here is a basic example of where I am

input_dict = {"data1": "/path/to/data1", "data2": "/path/to/data2"}

rule all:
    input: 
        expand('data/{dataset}.processed', dataset=input_dict.keys())

rule extract_master:
    output:
        'data/{dataset}.processed'
    input:
        master = rules.master_list.output, dataset = lambda wildcards: input_dict[wildcards.dataset]
    shell:
        "./extract_master.py --input {input.dataset} --out {output} --master {input.master}"

rule master_list:
    output:
        'data/master.txt'
    input:
        expand('data/{dataset}.chunk', dataset=input_dict.keys())
    shell:
        './master_list.py --input {input} --output {output}'

rule get_chunk:
    input:
        lambda wildcards: input_dict[wildcards.dataset]
    output:
        'data/{dataset}.chunk'
    shell:
        "./get_chunk.py --input {input} --output {output}"

I get an error:

'Rules' object has no attribute 'master_list'

I don't know how to specify two named inputs, where each input is not a simple string. If there is syntax I can use for the input section in the extract_master rule to fix this, that would be great. Otherwise, any thoughts on a better approach would be gladly received.


Solution

  • Importantly, be aware that referring to rule a here requires that rule a was defined above rule b in the file, since the object has to be known already. This feature also allows to resolve dependencies that are ambiguous when using filenames.

    Source

    That is, in your example, rule master_list should be defined before rule extract_master.