let's assume I have a snakemake rule like this:
rule test:
input: myfile="myfile.txt",
params: test_out = "test",
shell: "tool {input.myfile} -p ~/desktop/{params.test_out}
The tool does not support an -o option to specify the output, but provides -p to specify the prefix of the file.
Basically the tool takes some input, processes the files and generates multiple output files. But if I have a rule all at the top of my script, it won't execute this rule since it does not include an output directive. How can I still generate an output for this rule along with the output of the other rules specified in rule all?
Thanks for your help!
You still have an output, and if you know it you should tell snakemake. As an example, let's say you knew output.txt
was going to be the output:
rule test:
input: myfile="myfile.txt",
output: "test/output.txt"
params: test_out = lambda wildcards, output: Path(output[0].parent),
shell: "tool {input.myfile} -p {params.test_out}
You know the output file, you just need to translate that to the expected parameter. Some considerations here are if the tool always outputs to output.txt
regardless of input you have to have unique subdirectories per sample to prevent clobbering.
It may also be that the tool produces an indeterminate number of files, e.g. output_{1..n}.txt
but you can specify the directory. Then you'd have
rule test:
input: myfile="myfile.txt",
output: directory("test")
shell: "mkdir -p {output} ; tool {input.myfile} -p {output}
With directory
outputs, snakemake no longer auto creates parent directories so you have to do that manually. Any rules which consume tool outputs may also need the rule to be a checkpoint
so the outputs can be queried before execution.