Search code examples
pythonsnakemake

snakemake rule without an output directive


let's assume I have a snakemake rule like this:

rule test:
  input: myfile="myfile.txt",
  params: test_out = "test",
  shell:  "tool {input.myfile} -p ~/desktop/{params.test_out}

The tool does not support an -o option to specify the output, but provides -p to specify the prefix of the file.

Basically the tool takes some input, processes the files and generates multiple output files. But if I have a rule all at the top of my script, it won't execute this rule since it does not include an output directive. How can I still generate an output for this rule along with the output of the other rules specified in rule all?

Thanks for your help!


Solution

  • You still have an output, and if you know it you should tell snakemake. As an example, let's say you knew output.txt was going to be the output:

    rule test:
      input: myfile="myfile.txt",
      output: "test/output.txt"
      params: test_out = lambda wildcards, output: Path(output[0].parent),
      shell:  "tool {input.myfile} -p {params.test_out}
    

    You know the output file, you just need to translate that to the expected parameter. Some considerations here are if the tool always outputs to output.txt regardless of input you have to have unique subdirectories per sample to prevent clobbering.

    It may also be that the tool produces an indeterminate number of files, e.g. output_{1..n}.txt but you can specify the directory. Then you'd have

    rule test:
      input: myfile="myfile.txt",
      output: directory("test")
      shell:  "mkdir -p {output} ; tool {input.myfile} -p {output}
    

    With directory outputs, snakemake no longer auto creates parent directories so you have to do that manually. Any rules which consume tool outputs may also need the rule to be a checkpoint so the outputs can be queried before execution.