Search code examples
snakemake

Snakemake: temp files lead to immediate failure in repeated rule attempts


I have a rule using the bwa mem wrapper that sometimes fails due to cluster time limits. As this only happens occasionally, I do not want to generally increase the time limit for that job, but instead increase it with the number of attempts.

However, after failing due to cluster time limit, a lot of bwa mem tmp files are left in the output directory, which cause bwa mem to immediately fail in the next attempt. The generated tmp files are numbered out.tmp.1.bam .. out.tmp.n.bam, where n is some number as bwa mem sees fit, so I cannot simply mark these as temp files in Snakemake and rely on them being deleted on failure (I'm not even sure that this would happen - I don't know exactly when the deletion of files marked as temp is triggered...).

I considered the following solutions:

  • Delete these files first (by not using the wrapper, but instead copy the wrapper code, and modify it to delete all out.tmp.*.bam files before running bwa mem), but this seems ugly.

  • Use a shadow directory, in the hope that this is directory cleared after each attempt, but the documentation says

    Shadow directories are stored one per rule execution in .snakemake/shadow/, and are cleared on successful execution.

    Hence, for a failed execution, the temp files would still be there, which will cause subsequent attempts to fail as well. I guess that this is done in order to be able to debug failed runs. But here, it hinders restarts.

  • An alternative solution would be to have onstart, onsuccess, and onerror hooks per rule, as previously suggested in #133, but that is an option for the future...

I have posted a feature request for this problem already, but maybe there already is a pure Snakemake solution already out there. Any help appreciated!

Thanks, Lucas


Solution

  • Update: From a bit more experimentation, it seems that shadow: "full" does the job, and indeed also deletes the files when the job failed. Not entirely sure though, and the documentation is not clear on that. But so far, it works.