Search code examples
snakemake

Can snakemake create job error and output files automatically on a SLURM cluster?


I work on GNU/Linux Ubuntu 16.04.5.

I have the following rule in a Snakefile:

rule cutadapt:
    input:
        reads = '{path2reads}/raw/reads.fq
    output:
        trimmed = '{path2reads}/trimmed/reads.fq
    shell:
        "cutadapt -q 20 --minimum-length 40 --output {output.trimmed} {input.reads}"

Then, in my slurm.json file, I have:

...
    "output": "output/log/job/output/{rule}%A.o",
    "error": "output/log/job/error/{rule}%A.e",
...

In the Snakefile, I create the folders output/log/job/{error,output}.

This works fine, presumably because snakemake does not have to create new folders in order to store the error and output from the job, which I run like this:

snakemake output/reads/trimmed/reads.fq --cluster-config slurm.json --cluster "sbatch ... --output {cluster.output} --error {cluster.error} ..."

So path2reads will be evaluated to output/reads.

Note that I have omitted parameters which I deemed irrelevant for this discussion.

However, I would like SLURM to store my results in the folders output/lob/job/error/{rule}{wildcards}.e and output/lob/job/output/{rule}{wildcards}.o. If I put those folders in my slurm.json file, the job fails. This structure ...{rule}{wildcards}... worked for other rules which did not need to create new folders (because the wildcards did not contain a folder path`.

How can I get round this problem? I know I could figure out all the folders beforehand and create them before running snakemake but this seems inefficient. Isn't there a feature in snakemake that does this for me? After all, snakemake creates all output, benchmark and log folders if they don't exist. Why doesn't it do it for SLURM error and output files?

24th April 2019 update based on Johannes Koester's reply:

I have changed my rule to:

rule cutadapt:
    input:
        reads = '{path2reads}/raw/reads.fq
    output:
        trimmed = '{path2reads}/trimmed/reads.fq
    log:
        output = 'output/log/snakemake/output/cutadapt/path2reads={path2reads}.o',
        error = 'output/log/snakemake/error/cutadapt/path2reads={path2reads}.e',
        jobError = 'output/log/job/error/cutadapt/path2reads={path2reads}.e',
        jobOutput = 'output/log/job/output/cutadapt/path2reads={path2reads}.o',
    shell:
        'cutadapt -q 20 --minimum-length 40 --output {output.trimmed} {input.reads} > {log.output} 2> {log.error}'

and run the following snakemake command:

snakemake paths/2/reads/trimmed/reads.fq --cluster-config slurm.json --cluster "sbatch ... --output {cluster.output} --error {cluster.error} ..."

My jobs fail and some log directories are missing. I see the directories output/log/job/{error,output} but they are empty. I do not see the directory output/log/snakemake. However, if I first create the directories output/log/{job,snakemake}/{error,output}/cutadapt/path2reads=path/2/reads/, then the jobs are successful.

If I run snakemake on the head node, it also works. Note my slurm.json has the following parameters for fastqc (default parameters not shown):

"fastqc" :
      {
          "output" : "output/log/job/output/{rule}/{wildcards}.o",
          "error"  : "output/log/job/error/{rule}/{wildcards}.e"
      },

Do you know what could be going wrong?


Solution

  • There is no way to ensure this via snakemake. However the problem does not occur at all if you use the logging support of snakemake itself, which has the additional benefit of being independent of the execution platform: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files

    In that case, directories will of course be created by snakemake of they are not already present. Moreover, error messages will directly point you to the right log file.