I currently have a snakemake workflow that requires the use of lambda wildcards, set up as follows:
Snakefile:
configfile: "config.yaml"
workdir: config["work"]
rule all:
input:
expand("logs/bwa/{ref}.log", ref=config["refs"])
rule bwa_index:
input:
lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.gz"
output:
"logs/bwa/{ref}.log"
log:
"logs/bwa/{ref}.log"
shell:
"bwa index {input} 2&>1 {log}"
Config file:
work: /datasets/work/AF_CROWN_RUST_WORK/2020-02-28_GWAS
refs:
12NC29: GCA_002873275.1_ASM287327v1_genomic
12SD80: GCA_002873125.1_ASM287312v1_genomic
This works, but I've had to use a hack to get the output of bwa_index
to play with the input of all
. My hack is to generate a log file as part of bwa_index
, set the log to the output of bwa_index
, and then set the input of all
to these log files. As I said, it works, but I don't like it.
The problem is that the true outputs of bwa_index
are of the format, for example, GCA_002873275.1_ASM287327v1_genomic.fna.sa
. So, to specify these output files, I would need to use a lambda function for the output, something like:
rule bwa_index:
input:
lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.gz"
output:
lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.sa"
log:
"logs/bwa/{ref}.log"
shell:
"bwa index {input} 2&>1 {log}"
and then use a lambda function with expand for the input of rule all
. However, snakemake will not accept functions as output, so I'm at a complete loss how to do this (other than my hack). Does anyone have suggestions of a sensible solution? TIA!
You can use a simple python function in the inputs (as the lambda function) so I suggest you use it for the rule all
.
configfile: "config.yaml"
workdir: config["work"]
def getTargetFiles():
targets = list()
for r in config["refs"]:
targets.append("data/"+config["refs"][r]+".fna.sa")
return targets
rule all:
input:
getTargetFiles()
rule bwa_index:
input:
"data/{ref}.fna.gz"
output:
"data/{ref}.fna.sa"
log:
"logs/bwa/{ref}.log"
shell:
"bwa index {input} 2&>1 {log}"
Careful here the wildcard {ref}
is the value and not the key of your dictionnary so your log files will finally be named "logs/bwa/GCA_002873275.1_ASM287327v1_genomic.log", etc...