I am trying to clean a data pipeline by using snakemake
. It looks like wildcards are what I need but I don't manage to make it work in params
My function needs a parameter that depends on the wildcard value. For instance, let's say
it depends on sample
that can either be A
or B
.
I tried the following (my example is more complicated but this is basically what I am trying to do) :
sample = ["A","B"]
import pandas as pd
def dummy_example(sample):
return pd.DataFrame({"values": [0,1], "sample": sample})
rule all:
input:
"mybucket/sample_{sample}.csv"
rule testing_wildcards:
output:
newfile="mybucket/sample_{sample}.csv"
params:
additional="{sample}"
run:
df = dummy_example(params.additional)
df.to_csv(output.newfile, index = False)
which gives me the following error:
Wildcards in input files cannot be determined from output files: 'sample'
I followed the doc and put expand
in output
section.
For the params
, it looked like this section and this thread was giving me everything needed
sample_list = ["A","B"]
import pandas as pd
import re
def dummy_example(sample):
return pd.DataFrame({"values": [0,1], "sample": sample})
def get_wildcard_from_output(output):
return re.search(r'sample_(.*?).csv', output).group(1)
rule all:
input:
expand("sample_{sample}.csv", sample = sample_list)
rule testing_wildcards:
output:
newfile=expand("sample_{sample}.csv", sample = sample_list)
params:
additional=lambda wildcards, output: get_wildcard_from_output(output)
run:
print(params.additional)
df = dummy_example(params.additional)
df.to_csv(output.newfile, index = False)
InputFunctionException in line 16 of /home/jovyan/work/Snakefile: Error: TypeError: expected string or bytes-like object Wildcards:
Is there some way to catch the value of the wildcard in params to apply the value in run
?
I think that you are trying to get the sample
wildcard to use as a parameter in your script.
The wc
variable is an instance of snakemake.io.Wildcards
which is a snakemake.io.Namedlist
.
You can call .get(key)
on these objects, so we can use a lambda
function to generate the params.
samples_from_wc=lambda wc: wc.get("sample")
and use this in the run/shell as params.samples_from_wc
.
sample_list = ["A","B"]
import pandas as pd
def dummy_data(sample):
return pd.DataFrame({"values": [0, 1], "sample": sample})
rule all:
input: expand("sample_{sample}.csv", sample=sample_list)
rule testing_wildcards:
output:
newfile="sample_{sample}.csv"
params:
samples_from_wc=lambda wc: wc.get("sample")
run:
# Load input
df = dummy_data(params.samples_from_wc)
# Write output
df.to_csv(output.newfile, index=False)