I wrote a nextflow workflow that performs five steps. I have done this to be used by me and my colleagues, but not everyone is skilled with nextflow. Therefore, I decided to write a small wrapper in python3 that could run it for them.
The wrapper is very simple, reads the options with argparse
and creates a command to be run with subprocess.run()
. The issue with the wrapper is that, once the first step of the pipeline is completed, subprocess.run()
thinks that the process is over.
I tried using shell=True
, I tried using subprocess.Popen()
with a while
waiting for an output file, but it won't solve it.
How can I tell either subprocess.run()
to wait until the end-end, or to nextflow run
not to emit any exit code until the last step? Is there a way, or am I better off giving my colleagues a NF tutorial instead?
EDIT: The reason why I prefer the wrapper, is that nextflow creates lots of temporary files which one has to know how to clean up. The wrapper does it for them, saving disk space and my time.
The first part of your question is a bit tricky to answer without the details, but we know subprocess.run()
should wait for the command specified to complete. If your nextflow
command is actually exiting before all of your tasks/steps have completed, then there could be a problem with the workflow or with the version of Nextflow itself. Since this occurs after the first process completes, I would suspect the former. My guess is that there might be some plumbing issue somewhere. For example, if your second task/step definition is conditional in any way then this could allow an early exit from your workflow.
I would avoid the wrapper here. Running Nextflow pipelines should be easy, and the documentation that accompanies your workflow should be sufficient to get it up and running quickly. If you need to set multiple params on the command line, you could include one or more configuration profiles to make it easy for your colleagues to get started running it. The section on pipeline sharing is also worth reading if you haven't seen it already. If the workflow does create lots of temporary files, just ensure these are all written to the working directory. So upon successful completion, all you need to clean up should be a simple rm -rf ./work
. I tend to avoid automating destructive commands like this to avoid accidental deletes. A line in your workflow's documentation to say that the working directory can be removed (following successful completion of the pipeline) should be sufficient in my opinion and just leave it up to the users to clean up after themselves.
EDIT: You may also be interested in this project: https://github.com/goodwright/nextflow.py