I want to write a simple nextflow pipeline where I will in one job to multiples task one after another.
Here is my file for now :
// 1. Define the input directory
params.input_dir = "path1/path2/path3/Data"
// 2. Define the output directory
params.output_dir = "path1/path2/path3/Output"
// 3. Define the process using nextflow.enable.dsl=2
process RUN_KRAKEN2 {
publishDir params.output_dir, mode: 'copy'
// 3.2 Define the output file
output:
path("Kraken2_iterations.check")
// 3.3 Define the script
script:
"""
echo "Making directory"
mkdir -p ${params.output_dir}
for (file in file("$params.input_dir").list()) { # for each file in params.input_dir
if (file.endsWith(".m8")) { #if it ends with *.m8
#then do this task
task1
#and finaly run this last task
task2
}
"""
}
// 4. Run the workflow
workflow {
RUN_KRAKEN2()
}
But there are issues in this code. I'm really new to Nextflow and especially Groovy. I've attempted to comment on each line to explain what I wanted to achieve. If someone could correct the code to be written properly, it would be amazing.
Additionally, as you can see, I've defined an output file called 'Kraken2_iterations.check
'. I would like to create such a file at the end of the script, after every *.m8
file has been processed. Does anyone have an idea? I've thought of using a variable 'n=0
' and then incrementing 'n
' by 1 with each iteration. Then, I could use an 'if
' statement like 'if n < count(files ending with *.m8)
', but I'm unsure how to implement this in Groovy.
You can use the Groovy function eachFileRecurse(...)
to iterate a folder for all files of a specific file type. This answer has an example:
Recursive listing of all files matching a certain filetype in Groovy
--- Updated ---
I got this project structure:
├── data
│ ├── data.m8
│ └── input
│ ├── in.m8
│ └── out.m8
├── find-files
│ └── main.nf
└── nextflow
main.nf:
#!/usr/bin/env nextflow
process RUN_KRAKEN2 {
script:
def out = ''
new File('.').eachFileRecurse(groovy.io.FileType.FILES) {
if(it.name.endsWith('.m8')) {
println it
out += it
}
}
"""
touch done
echo ${out} >> done
"""
}
workflow {
RUN_KRAKEN2()
}
Which when executed:
./nextflow run find-files/
generates a file named done
in the work directory (fx. work/27/e1994e62df22bb5c7cb0ec3ef1f2cd/done
) containing the files found:
./data/data.m8./data/input/in.m8./data/input/out.m8