I am running a pipeline and was trying to optimize it by declaring the paths in a config file (config.yaml). The config.yaml file contains the path to find the scripts to run inside the pipeline, but when I expand the wildcard of the path, the pipeline does not run the script. The script itself runs fine. To explain my problem:
rule with_script:
input: someinput
output: someoutput
script: expand("{script_path}/scriptfile", script_path = config[scriptpath])
input, output or rule all do not contain the script's path wildcard, so here is the first time I'm declaring it. The config.yaml line that contains the path looks like this:
scriptpath: /path/to/the/script
is there a way to maintain the wildcard and config file path (to make it easier for others to make changes if needed) and have the script work? Like this snakemake doesn't even enter the script file. Or maybe it is possible to declare global wildcards outside the rule all?
Thank you for your help!
P.S.: I'm sorry if this question has already been answered, but I couldn't find anything to help me with this.
You cannot define a function like expand()
in the script section. Snakemake expects a path to your script.
Like the documentation states:
The script path is always relative to the Snakefile containing the directive (in contrast to the input and output file paths, which are relative to the working directory). It is recommended to put all scripts into a subfolder "scripts"
If you need to define different paths to your scripts, you can always do it in python outside of your rules. Don't forget, all python code outside of rules is executed before building the DAG. Thus, you can define all variables you want and use them in your rules.
SCRIPTSPATH = config["scriptpath"]
rule with_script:
input: someinput
output: someoutput
script: "{SCRIPTSPATH}/scriptfile"
Note:
Do not mix wildcards and "variables". In an expand function as
expand("{script_path}/scriptfile", script_path = config[scriptpath])
{script_path}
is not a wildcard but just a placeholder for the values given in the second parameter of the function.