Search code examples
pythonbioinformaticssnakemake

Wildcard within wildcard in Snakemake


I have very recently started using Snakemake.

What I am trying to achieve is this: I have some samples which need to be modified, independently (rule index_vcf). Then, groups of samples are the input of another rule (analyse), and we get an output for each group in groups.

I would like the second rule to run two commands:

some_script --input_files 'A2 Ah AL' --output ../out/A.out

and

some_script --input_files 'Banana BLM' --output ../out/B.out

I know how to do it if it's just for one group, but if I do it for both, then the wildcard sample_from_group which I am expanding in analyse needs to depend on the group and I get the error

unhashable type: 'list'

This is my config file:

groups:
-   A
-   B

samples:
- A2
- Ah
- AL
- Banana
- BLM

grouped_samples:
  A: A2_mod, Ah_mod, AL_mod
  B: Banana_mod, BLM_mod

and this is my Snakefile

configfile: "config_PCAWG.yaml"
samples = config["samples"]
groups = config["groups"]
grouped_samples = config["grouped_samples"]


rule all:
    input:
      expand("../out/{group}.out", group = groups)

rule index_vcf:
    input:
        "../data/{sample}"
    output:
        "../data/{sample}_mod"
    shell:
        "tabix -f {input}"

rule analyse:
    input:
        expand("{sample_from_group}", sample_from_group=grouped_samples[{group}].split())
    output:
         "../out/{group}.out
    shell:
        "some_script --input_files '{input}' --output {output}"

Solution

  • Firstly you need to correct misprints (e.g. there is a string that has no closeing quote). Next, there is a logical error in that no rule produces anything that matches the pattern "../out/{group}.out". Did you mean "../data/{group}.out"?

    Now the main part. This is an invalid syntax:

            expand("{sample_from_group}", sample_from_group=grouped_samples[{group}].split())
    

    What you meant is a lambda (or a function) that takes a wildcard and produces an expand:

    rule analyse:
        input:
            lambda wildcards: expand("{sample_from_group}",
                                 sample_from_group=grouped_samples[wildcards.group].split())