Search code examples
groovynextflow

Is there a way to handle a variable number of inputs/outputs in Nextflow?


Is there a way to handle a varying number of inputs/outputs in Nextflow? Sometimes in the example below process 'foo' will have three inputs (and therefore create three pngs that need stitched together by 'bar') but other times there will be two or four. I'd like process 'bar' to be able to combine all existing files in 'foo.out.files' regardless of number. As it stands this would be able to properly handle everything only if there were exactly three inputs in params.input, but not if there were two or four.

Thanks!

#!/usr/bin/env nextflow
nextflow.enable.dsl=2

process foo {
   input:
   path input_file

   output:
   path '*.png', emit files

   """
   script that creates variable number of png files
   """
}

process bar {
   input:
   tuple path(file_1), path(file_2), path(file_3)

   """
   script that combines png files ${file_1} ${file_2} ${file_3}
   """
}

workflow {

   foo(params.input)
   bar(foo.out.files.collect())
}

UPDATE: I'm getting 'Input tuple does not match input set cardinality' errors for this, for example:

params.num_files = 3

process foo {
   input:
   val num_files

   output:
   path '*.png', emit: files

   """
   touch \$(seq -f "%g.png" 1 ${num_files})
   """
}

process bar {
   debug true

   input:
   tuple val(word), path(png_files)

   """
   echo "${word} ${png_files}"
   """
}

workflow {
    foo( params.num_files )

    words = Channel.from('a','b','c')

    words
        .combine(foo.out.files)
        .set { combined }
    
    bar(combined)
}

Solution

  • You don't acutally need the tuple qualifier here: the path input qualifier can also handle a collection of files. If you use a variable or the * wildcard, the original filenames will be preserved. In the example below, the variable refers to all files in the list. But if you need to, you can also access specific entries; for example:

    params.num_files = 3
    
    process foo {
       input:
       val num_files
    
       output:
       path '*.png', emit: files
    
       """
       touch \$(seq -f "%g.png" 1 ${num_files})
       """
    }
    
    process bar {
       debug true
    
       input:
       path png_files
    
       script:
       def first_png = png_files.first()
       def last_png = png_files.last()
    
       """
       echo ${png_files}
       echo "${png_files[0]}"
       echo "${first_png}, ${last_png}"
       """
    }
    
    workflow {
       bar( foo( params.num_files ) )
    }
    

    Results:

    $ nextflow run main.nf
    N E X T F L O W  ~  version 22.04.0
    Launching `main.nf` [mighty_solvay] DSL2 - revision: 662b108e42
    executor >  local (2)
    [e0/a619c4] process > foo [100%] 1 of 1 ✔
    [ba/5f8032] process > bar [100%] 1 of 1 ✔
    1.png 2.png 3.png
    1.png
    1.png, 3.png
    
    

    If you need to avoid potential filename collisions, you can have Nextflow rewrite the input filenames using a name pattern. If the name pattern is a simple string and a collection of files is received, the filenames will be appended with a numerical suffix representing the ordinal position in the list. For example, if we change the 'bar' process definition to:

    process bar {
    
       debug true
    
       input:
       path 'png_file'
    
       """
       echo png_file*
       """
    }
    

    We get:

    $ nextflow run main.nf
    N E X T F L O W  ~  version 22.04.0
    Launching `main.nf` [marvelous_bhaskara] DSL2 - revision: 980a2d067f
    executor >  local (2)
    [f8/190e2c] process > foo [100%] 1 of 1 ✔
    [71/e53b05] process > bar [100%] 1 of 1 ✔
    png_file1 png_file2 png_file3
    
    
    $ nextflow run main.nf --num_files 1
    N E X T F L O W  ~  version 22.04.0
    Launching `main.nf` [nauseous_brattain] DSL2 - revision: 980a2d067f
    executor >  local (2)
    [ce/2ba1b1] process > foo [100%] 1 of 1 ✔
    [a2/7b867e] process > bar [100%] 1 of 1 ✔
    png_file
    
    

    Note that the * and ? wildcards can be used to control the names of the staged files. There is a table in the documentation that describes how the wildcards are to be replaced depending on the cardinality of the collection. For example, if we again change the 'bar' process definition to:

    process bar {
    
       debug true
    
       input:
       path 'file*.png'
    
       """
       echo file*.png
       """
    }
    

    We get:

    $ nextflow run main.nf
    N E X T F L O W  ~  version 22.04.0
    Launching `main.nf` [small_poincare] DSL2 - revision: b106710bc6
    executor >  local (2)
    [7c/cf38b8] process > foo [100%] 1 of 1 ✔
    [a6/8cb817] process > bar [100%] 1 of 1 ✔
    file1.png file2.png file3.png
    
    
    $ nextflow run main.nf --num_files 1
    N E X T F L O W  ~  version 22.04.0
    Launching `main.nf` [friendly_pasteur] DSL2 - revision: b106710bc6
    executor >  local (2)
    [59/2b235b] process > foo [100%] 1 of 1 ✔
    [2f/76e4e2] process > bar [100%] 1 of 1 ✔
    file.png