I am using input functions in my Snakemake rules. Most of these rules simply look up a sample sheet (pandas data frame) derived from the PEP specifications. For example ..
samples = pep.sample_table
def get_image(wildcards):
return samples.loc[wildcards.sample, "image_file"]
def get_visium_fastqs(wildcards):
return samples.loc[wildcards.sample, "visium_fastqs"]
def get_slide(wildcards):
return samples.loc[wildcards.sample, "slide"]
def get_area(wildcards):
return samples.loc[wildcards.sample, "area"]
Unfortunately, input functions can only have one parameter, wildcards
, which essentially a named list of wildcards and their values. Otherwise I could define an input function something like this ...
def lookup_sample_table(wildcards, target):
return samples.loc[wildcards.sample, target]
... and then call this is in a rule as ...
input:
fq=lookup_sample_table(target="visium_fastqs")
But AFAIK this is not possible.
I tried lambda functions in my rules. For example ..
input:
lambda wildcards: samples.loc[wildcards.sample, "slide"]
This works OK if the input
items are not named. But I can't figure out how to create named input items usng lambda functions. For example, the following doesn't work ...
input:
slide=lambda wildcards: samples.loc[wildcards.sample, "slide"]
Can I combine named inputs with lambda functions? If so, then I could extend the idea in this answer.
This is such a generic situation, I am sure that there must be a generic solution, right?
Inspired by this question I have come up with the following generic function which seems to work (so far):
def sample_lookup(pattern):
def handle_wildcards(wildcards):
s = pattern.format(**wildcards)
[sample,target] = s.split(':')
return samples.loc[sample, target]
return handle_wildcards
This function is called as follows:
rule preproc:
input:
bam=sample_lookup('{sample}:sample_bam'),
barcodes=sample_lookup('{sample}:sample_barcodes')
That is, sample_lookup()
is given a "pattern" with the {sample}
wildcard, followed by the name of the column in sample_table
to look up.
But this function definition is quite opaque compared to the simple (if repetitive) input functions that I started with, and I feel like I'm beginning to invent my own syntax, which then makes the rules harder to read.
What is the simplest way to reduce repetition and redundancy in this kind of input function?
Not sure if I missing something but this should give you what you want:
def lookup_sample_table(sample, target):
return samples.loc[sample, target]
# Bla bla bla
input:
fq=lambda wc: lookup_sample_table(sample=wc.sample, target="visium_fastqs")