Search code examples
pythonbioinformaticssnakemake

Loss of certain values when using dict and expand together in Snakemake


I have try to use a function to return a dict as a rule input, here is a simple expamle:

# debug.smk
input_a='/{output}/to/{group_name}/input_a'
input_b='/{output}/to/{group_name}/input_b'

rule_a=True
rule_b=True

output='output'
group_name='group_name'

def get_rule_output(): 
    output={}
    if rule_a: 
        output['output_a']=expand(input_a, output=output, group_name=group_name)
        print(expand(input_a, output=output, group_name=group_name))
    if rule_b: 
        output['output_xx']=expand(input_b, output=output, group_name=group_name)

    print(output)
    
    return output

rule all: 
    input: 
        get_rule_output(), 

I use following command to run it

snakemake -s debug.smk

My expected:

get_rule_output() return a dict with two output

{'output_a': ['/output_a/to/group_name/input_a'], 'output_xx': ['/output_a/to/group_name/input_b']}

however, its return value is

{'output_a': [], 'output_xx': ['/output_a/to/group_name/input_b']}

debug output by print() is following

snakemake -s snakemake/debug.smk
['/output_a/to/group_name/input_a']
{'output_a': [], 'output_xx': ['/output_a/to/group_name/input_b']}
Building DAG of jobs...

#other snakemake debug output

Solution

  • You need to rename output={} inside your function:

    In your code output has two meanings:

    • output='output' which is also the value you want to use in expand(...)
    • output={} which is the dictionary you want to hold your results, but which you are using in your expand(...) instead

    Due to the local scope, output={} is used instead of output='output' for your expand(...) statement.

    This works

    # debug.smk
    input_a = "/{output}/to/{group_name}/input_a"
    input_b = "/{output}/to/{group_name}/input_b"
    
    rule_a = True
    rule_b = True
    
    output = "output"
    group_name = "group_name"
    
    
    def get_rule_output():
    
        d= {}
    
        d["output_a"] = expand(input_a, output=output, group_name=group_name)
        d["output_b"] = expand(input_b, output=output, group_name=group_name)
    
        print(d)
    
        return output
    
    
    rule all:
        input:
            get_rule_output(),