Search code examples
pythonsnakemake

Passing wildcard values in params in snakemake


I am trying to clean a data pipeline by using snakemake. It looks like wildcards are what I need but I don't manage to make it work in params

My function needs a parameter that depends on the wildcard value. For instance, let's say it depends on sample that can either be A or B.

I tried the following (my example is more complicated but this is basically what I am trying to do) :

sample = ["A","B"]

import pandas as pd

def dummy_example(sample):
    return pd.DataFrame({"values": [0,1], "sample": sample})

rule all:
    input:
        "mybucket/sample_{sample}.csv"

rule testing_wildcards:
    output:
        newfile="mybucket/sample_{sample}.csv"
    params:
        additional="{sample}"
    run:
        df = dummy_example(params.additional)
        df.to_csv(output.newfile, index = False)

which gives me the following error:

Wildcards in input files cannot be determined from output files: 'sample'

I followed the doc and put expand in output section. For the params, it looked like this section and this thread was giving me everything needed

sample_list = ["A","B"]

import pandas as pd
import re

def dummy_example(sample):
    return pd.DataFrame({"values": [0,1], "sample": sample})
    
def get_wildcard_from_output(output):
    return re.search(r'sample_(.*?).csv', output).group(1)

rule all:
    input:
        expand("sample_{sample}.csv", sample = sample_list)

rule testing_wildcards:
    output:
        newfile=expand("sample_{sample}.csv", sample = sample_list)
    params:
        additional=lambda wildcards, output: get_wildcard_from_output(output)
    run:
        print(params.additional)
        df = dummy_example(params.additional)
        df.to_csv(output.newfile, index = False)

InputFunctionException in line 16 of /home/jovyan/work/Snakefile: Error: TypeError: expected string or bytes-like object Wildcards:

Is there some way to catch the value of the wildcard in params to apply the value in run ?


Solution

  • I think that you are trying to get the sample wildcard to use as a parameter in your script.

    The wc variable is an instance of snakemake.io.Wildcards which is a snakemake.io.Namedlist. You can call .get(key) on these objects, so we can use a lambda function to generate the params.

    samples_from_wc=lambda wc: wc.get("sample") and use this in the run/shell as params.samples_from_wc.

    sample_list = ["A","B"]
    
    import pandas as pd
    
    def dummy_data(sample):
        return pd.DataFrame({"values": [0, 1], "sample": sample})
    
    rule all:
        input: expand("sample_{sample}.csv", sample=sample_list)
    
    rule testing_wildcards:
        output:
            newfile="sample_{sample}.csv"
        params:
            samples_from_wc=lambda wc: wc.get("sample")
        run:
            # Load input
            df = dummy_data(params.samples_from_wc)
            # Write output
            df.to_csv(output.newfile, index=False)