Search code examples
nextflow

nextflow: process all pair of files from distinct directories


I've this structure:

|-- combine.nf
|-- first
|   |-- a.txt
|   |-- b.txt
|   `-- c.txt
|-- second
|   |-- A.txt
|   `-- B.txt

And combine.nf is

#!/usr/bin/env nextflow
process sayHello {
  input:
  path first
  path second

  output:
    stdout

  script:
    """
    echo 'Hello couple (${first}, ${second})' 
    """
}

workflow {
    def files_first = Channel.fromPath("first/*.txt")
    def files_second = Channel.fromPath("second/*.txt")
    sayHello(files_first, files_second) | view { it }
  }

The sayHello process is only called for two pairs (the size of the smallest directory in fact):

Hello couple (a.txt, A.txt)
Hello couple (b.txt, B.txt)

How to process all possible pairs? Thanks in advance

PS: this question is generic, in my case one of the directory contains only one file.


Solution

  • A process that consumes elements from two independent 'queue' channels will grab one value from each for each execution. The last pair executed is determined by the shorter channel. So this is exactly what you are getting.

    What you need to do is to combine the both channels into a single one that will contains all pairs:

    workflow {
        def files_first = Channel.fromPath("first/*.txt")
        def files_second = Channel.fromPath("second/*.txt")
        all_pairs = files_first.combine(files_second)
        sayHello(all_pairs) | view { it }
    }

    Then you need to modify the process to only take to combined channel as input

    process sayHello {
      input:
        tuple path(first), path(second)
      
      ...
    }