Search code examples
bashif-statementprocessworkflownextflow

Error checking the presence of a folder in a nextflow process


I'm trying to write a workflow using nextflow. In my first process I need to create a folder to save all the file my process will create.

Here is my process :

params.fast5       = "/scratch/use/fast5"
params.refSeq      = "/scratch/user/reference_sequences/NC_012920_OFFSET_2652.fa"
params.guppy_out   = "/scratch/user/GPU/output_8"

process guppy {
    input:
            path fast5
            path guppy_out
            path refSeq
    output:
            path "guppy.info"
    script:
            """
            if [ -d "${guppy_out}" ] || test -f "${guppy_out}"
            then
                    echo "Already exist" > debug.guppy
            else
                    mkdir -p "${guppy_out}"
            fi

            job_id="\$(sbatch --parsable /scratch/user/scripts/guppy_script.sh $fast5 $guppy_out $refSeq)"
            echo "\${job_id}" > guppy.info
            """
}

My error is :

mkdir: cannot create directory 'output_8': File exists

But when I check in the folder scratch/user/GPU/output_8 there is no folder or file with this name...

I have try to replace "${guppy_out}" by it value "/scratch/user/GPU/output_8" and i get a new error :

boost::filesystem::create_directories: File exists [system:17]: "output_8", "output_8"

How can I proceed here so that I can correctly verify the presence of such a file and create it if necessary?


Solution

  • Usually you don't need to contend with such problems, since Nextflow already does this for you: Nextflow processes are executed independently and are isolated from each other inside the working directory (i.e. ./work). Note that Nextflow can also be configured to use the SLURM executor by adding the following to your nextflow.config:

    process {
    
      executor = 'slurm'
    }
    

    So my guess is that, with the above, the following might also work for you:

    params.fast5       = './scratch/user/fast5'
    params.refSeq      = './scratch/user/reference_sequences/NC_012920_OFFSET_2652.fa'
    params.outdir      = './results'
    
    process guppy {
        
        publishDir "${params.outdir}/guppy", mode: 'copy'
    
        input:
        path fast5
        path refSeq
        
        output:
        path "output"
        
        """
        mkdir output
        guppy_script.sh "${fast5}" "output" "${refSeq)"
        """
    }
    
    workflow {
    
        fast5_dir = path( params.fast5 )
        fasta_file = path( params.refSeq )
    
        guppy( fast5_dir, fasta_file )
    }
    

    The above assumes that you can move/copy your shell script into a a folder called 'bin' in the root directory of your project repository. Nextflow automatically adds this folder to the PATH environment variable in the execution environment1. If it's not already executable, you can make it executable using chmod +x guppy_script.sh for example.

    But I would avoid using the above code however. It's likely the contents of your shell script can be included directly into your Nextflow script block. This would let you avoid creating an 'output' directory (which appears superfluous) and would let you declare each of the guppy outputs rather than the top-level directory itself. Happy to make some additional suggestions if you can describe a bit more about what it is that you're trying to do exactly.