Search code examples
pythondictionarywildcardcombinatoricssnakemake

How to use a wildcard within expand function parameters in snakemake?


I have a json file like so:

{
    "foo": {
        "bar1": 
            {"A1": {"name": "A1", "path": "/path/to/A1"}, 
             "B1": {"name": "B1", "path": "/path/to/B1"},
             "C1": {"name": "C1", "path": "/path/to/C1"},
             "D1": {"name": "D1", "path": "/path/to/D1"}},
        "bar2": 
            {"A2": {"name": "A2", "path": "/path/to/A2"}, 
             "B2": {"name": "B2", "path": "/path/to/B2"},
             "C2": {"name": "C2", "path": "/path/to/C2"},
             "D2": {"name": "D2", "path": "/path/to/D2"}}}
}

I am trying to run my snakemake pipeline on the samples in sample sets 'bar1' and 'bar2' separately, putting the results into their own folders. When I expand my wildcards I don't want all iterations of sample sets and samples, I just want them in their specific groups, like this:

tmp/bar1/A1.bam
tmp/bar1/B1.bam
tmp/bar1/C1.bam
tmp/bar1/D1.bam
tmp/bar2/A2.bam
tmp/bar2/B2.bam
tmp/bar2/C2.bam
tmp/bar2/D2.bam

Hopefully my snakefile will help explain. I have tried having my snakefile like this:

sample_sets = [ i for i in config['foo'] ]

samples_dict = config['foo'] #cleans it up

def get_samples(wildcards):
    return list(samples_dict[wildcards.sample_set].keys())

rule all:
    input:
        expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = get_samples), sample_set = sample_sets),

This doesn't work, my file names end up with "<function get_samples at 0x7f6e00544320>" in them! I have also tried:

rule all:
    input:
        expand(expand("tmp/{{sample_set}}/{sample}.bam", sample = list(samples_dict["{{sample_set}}"].keys()), sample_set = sample_sets),

but that get's a KeyError. Have also tried this:

rule all:
    input:
        [ ["tmp/{{sample_set}}/{sample}.aligned_bam.core.bam".format( sample = sample ) for sample in list(samples_dict[sample_set].keys())] for sample_set in sample_sets ]

which gets an "Wildcards in input files cannot be determined from output files: 'sample_set'" error.

I feel like there must be a simple way of doing this and perhaps I'm being a moron.

Any help would be very much appreciated! And let me know if I've missed some detail.


Solution

  • There is a possibility of using a custom combinatoric function in expand. Most often this function is zip, however, in your case the nested dictionary shape will require designing a custom function. Instead, a simpler solution is to use Python to construct the list of desired files.

    d = {
        "foo": {
            "bar1": {
                "A1": {"name": "A1", "path": "/path/to/A1"},
                "B1": {"name": "B1", "path": "/path/to/B1"},
                "C1": {"name": "C1", "path": "/path/to/C1"},
                "D1": {"name": "D1", "path": "/path/to/D1"},
            },
            "bar2": {
                "A2": {"name": "A2", "path": "/path/to/A2"},
                "B2": {"name": "B2", "path": "/path/to/B2"},
                "C2": {"name": "C2", "path": "/path/to/C2"},
                "D2": {"name": "D2", "path": "/path/to/D2"},
            },
        }
    }
    
    list_files = []
    
    for key in d["foo"]:
        for nested_key in d["foo"][key]:
            _tmp = f"tmp/{key}/{nested_key}.bam"
            list_files.append(_tmp)
    
    print(*list_files, sep="\n")
    #tmp/bar1/A1.bam
    #tmp/bar1/B1.bam
    #tmp/bar1/C1.bam
    #tmp/bar1/D1.bam
    #tmp/bar2/A2.bam
    #tmp/bar2/B2.bam
    #tmp/bar2/C2.bam
    #tmp/bar2/D2.bam