Is there a way to handle a varying number of inputs/outputs in Nextflow? Sometimes in the example below process 'foo' will have three inputs (and therefore create three pngs that need stitched together by 'bar') but other times there will be two or four. I'd like process 'bar' to be able to combine all existing files in 'foo.out.files' regardless of number. As it stands this would be able to properly handle everything only if there were exactly three inputs in params.input, but not if there were two or four.
Thanks!
#!/usr/bin/env nextflow
nextflow.enable.dsl=2
process foo {
input:
path input_file
output:
path '*.png', emit files
"""
script that creates variable number of png files
"""
}
process bar {
input:
tuple path(file_1), path(file_2), path(file_3)
"""
script that combines png files ${file_1} ${file_2} ${file_3}
"""
}
workflow {
foo(params.input)
bar(foo.out.files.collect())
}
UPDATE: I'm getting 'Input tuple does not match input set cardinality' errors for this, for example:
params.num_files = 3
process foo {
input:
val num_files
output:
path '*.png', emit: files
"""
touch \$(seq -f "%g.png" 1 ${num_files})
"""
}
process bar {
debug true
input:
tuple val(word), path(png_files)
"""
echo "${word} ${png_files}"
"""
}
workflow {
foo( params.num_files )
words = Channel.from('a','b','c')
words
.combine(foo.out.files)
.set { combined }
bar(combined)
}
You don't acutally need the tuple
qualifier here: the path
input qualifier can also handle a collection of files. If you use a variable or the *
wildcard, the original filenames will be preserved. In the example below, the variable refers to all files in the list. But if you need to, you can also access specific entries; for example:
params.num_files = 3
process foo {
input:
val num_files
output:
path '*.png', emit: files
"""
touch \$(seq -f "%g.png" 1 ${num_files})
"""
}
process bar {
debug true
input:
path png_files
script:
def first_png = png_files.first()
def last_png = png_files.last()
"""
echo ${png_files}
echo "${png_files[0]}"
echo "${first_png}, ${last_png}"
"""
}
workflow {
bar( foo( params.num_files ) )
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 22.04.0
Launching `main.nf` [mighty_solvay] DSL2 - revision: 662b108e42
executor > local (2)
[e0/a619c4] process > foo [100%] 1 of 1 ✔
[ba/5f8032] process > bar [100%] 1 of 1 ✔
1.png 2.png 3.png
1.png
1.png, 3.png
If you need to avoid potential filename collisions, you can have Nextflow rewrite the input filenames using a name pattern. If the name pattern is a simple string and a collection of files is received, the filenames will be appended with a numerical suffix representing the ordinal position in the list. For example, if we change the 'bar' process definition to:
process bar {
debug true
input:
path 'png_file'
"""
echo png_file*
"""
}
We get:
$ nextflow run main.nf
N E X T F L O W ~ version 22.04.0
Launching `main.nf` [marvelous_bhaskara] DSL2 - revision: 980a2d067f
executor > local (2)
[f8/190e2c] process > foo [100%] 1 of 1 ✔
[71/e53b05] process > bar [100%] 1 of 1 ✔
png_file1 png_file2 png_file3
$ nextflow run main.nf --num_files 1
N E X T F L O W ~ version 22.04.0
Launching `main.nf` [nauseous_brattain] DSL2 - revision: 980a2d067f
executor > local (2)
[ce/2ba1b1] process > foo [100%] 1 of 1 ✔
[a2/7b867e] process > bar [100%] 1 of 1 ✔
png_file
Note that the *
and ?
wildcards can be used to control the names of the staged files. There is a table in the documentation that describes how the wildcards are to be replaced depending on the cardinality of the collection. For example, if we again change the 'bar' process definition to:
process bar {
debug true
input:
path 'file*.png'
"""
echo file*.png
"""
}
We get:
$ nextflow run main.nf
N E X T F L O W ~ version 22.04.0
Launching `main.nf` [small_poincare] DSL2 - revision: b106710bc6
executor > local (2)
[7c/cf38b8] process > foo [100%] 1 of 1 ✔
[a6/8cb817] process > bar [100%] 1 of 1 ✔
file1.png file2.png file3.png
$ nextflow run main.nf --num_files 1
N E X T F L O W ~ version 22.04.0
Launching `main.nf` [friendly_pasteur] DSL2 - revision: b106710bc6
executor > local (2)
[59/2b235b] process > foo [100%] 1 of 1 ✔
[2f/76e4e2] process > bar [100%] 1 of 1 ✔
file.png