Search code examples
snakemake

Snakemake: wildcards for parameter keys


I'm trying to create a snakemake rule for which the input and output are config parameters specified by a wildcard but having problems.

I would like to do something like:

config.yaml

cam1:
  raw: "src/in1.avi"
  bg: "out/bg1.png"
cam2:
  raw: "src/in2.avi"
  bg: "out/bg2.png"
cam3:
  raw: "src/in3.avi"
  bg: "out/bg3.png"

Snakefile:

configfile: "config.yml"

...
rule all:
  input:
    [config[f'cam{id}']['background'] for id in [1, 2, 3]]

rule make_bg:
  input:
    raw=config["{cam}"]["raw"]
  output:
    bg=config["{cam}"]["bg"]
  shell:
    """
    ./process.py {input.raw} {output.bg}
    """

But this doesn't seem to play - I would like {cam} to be treated as a wildcard, instead I get a KeyError for {cam}. Can anyone help?

Is it possible to specify {cam} as a wildcard (or something else) that could then be used a config key?


Solution

  • I think that there are a few problems with this approach:

    Conceptually

    It does not make much sense to specify the exact input and output filenames in a config, since this is pretty much diametrically opposed to why you would use snakemake: Infer from the inputs what part of the pipeline needs to be run to create the desired outputs. In this case, you would always have to first edit the config for each input/output pair and the whole point of automatisation is lost.

    Now, the actual problem is to access config variables from the config for input and output. Typically, you would e.g. provide some paths in the config and use something like:

    config.yaml:

    raw_input = 'src'
    bg_output = 'out'
    

    In the pipeline, you could then use it like this:

    input: os.path.join(config['raw_input'], in{id}.avi)
    output: os.path.join(config['bg_output'], bg{id}.avi)
    

    Like I said, it makes no sense to specify especially the outputs in the config file.

    If you were to specify the inputs in config.yaml:

    cam1:
      raw: "src/in1.avi"
    cam2:
      raw: "src/in2.avi"
    cam3:
      raw: "src/in3.avi"
    
    

    you could then get the inputs with a function as below:

    configfile: "config.yaml"
    
    # create sample data
    os.makedirs('src', exist_ok= True)
    for i in [1,2,3]:
        Path(f'src/in{i}.avi').touch()
    
    ids = [1,2,3]
    
    def get_raw(wildcards):
        id = 'cam' + wildcards.id
        raw = config[f'{id}']['raw']
        return raw
    
    rule all:
      input: expand('out/bg{id}.png', id = ids)
    
    rule make_bg:
        input:
            raw = get_raw
        output:
            bg='out/bg{id}.png'
        shell:
            " touch {input.raw} ;"
            " cp {input.raw} {output.bg};"