Search code examples
groovynextflow

For loop in files ending with extension in nextflow and groovy


I want to write a simple nextflow pipeline where I will in one job to multiples task one after another.

Here is my file for now :

// 1. Define the input directory
params.input_dir = "path1/path2/path3/Data"

// 2. Define the output directory
params.output_dir = "path1/path2/path3/Output"

// 3. Define the process using nextflow.enable.dsl=2

process RUN_KRAKEN2 {

    publishDir params.output_dir, mode: 'copy'

    // 3.2 Define the output file
    output:
    path("Kraken2_iterations.check")

    // 3.3 Define the script
    script:
    """
    echo "Making directory"
    mkdir -p ${params.output_dir}
    for (file in file("$params.input_dir").list()) { # for each file in params.input_dir
     if (file.endsWith(".m8")) { #if it ends with *.m8
         #then do this task
         task1
         #and finaly run this last task
         task2
     }

    """
}


// 4. Run the workflow
workflow {
    RUN_KRAKEN2()
}

But there are issues in this code. I'm really new to Nextflow and especially Groovy. I've attempted to comment on each line to explain what I wanted to achieve. If someone could correct the code to be written properly, it would be amazing.

Additionally, as you can see, I've defined an output file called 'Kraken2_iterations.check'. I would like to create such a file at the end of the script, after every *.m8 file has been processed. Does anyone have an idea? I've thought of using a variable 'n=0' and then incrementing 'n' by 1 with each iteration. Then, I could use an 'if' statement like 'if n < count(files ending with *.m8)', but I'm unsure how to implement this in Groovy.


Solution

  • You can use the Groovy function eachFileRecurse(...) to iterate a folder for all files of a specific file type. This answer has an example: Recursive listing of all files matching a certain filetype in Groovy

    --- Updated ---

    I got this project structure:

    ├── data
    │   ├── data.m8
    │   └── input
    │       ├── in.m8
    │       └── out.m8
    ├── find-files
    │   └── main.nf
    └── nextflow
    

    main.nf:

    #!/usr/bin/env nextflow
    
    process RUN_KRAKEN2 {
    
        script:
        def out = ''
        new File('.').eachFileRecurse(groovy.io.FileType.FILES) {
            if(it.name.endsWith('.m8')) {
                println it
                out += it
            }
        }
    
        """
        touch done
        echo ${out} >> done
        """
    }
    
    workflow {
        RUN_KRAKEN2()
    }
    

    Which when executed:

    ./nextflow run find-files/
    

    generates a file named done in the work directory (fx. work/27/e1994e62df22bb5c7cb0ec3ef1f2cd/done) containing the files found:

    ./data/data.m8./data/input/in.m8./data/input/out.m8