Search code examples
nextflow

How do I process files with matching pattern in nextflow?


Suppose I have nextflow channels:

Channel.fromFilePairs( "test/read*_R{1,2}.fa" )
       .set{ reads }
reads.view() 

Channel.fromPath(['test/lib_R1.fa','test/lib_R2.fa'] )
        .set{ libs }
libs.view()

Which results in:

// reads channel
[read_b, [<path>/test/read_b_R1.fa, <path>/test/read_b_R2.fa]]
[read_a, [<path>/test/read_a_R1.fa, <path>/test/read_a_R2.fa]]

// libs channel
<path>/test/lib_R1.fa
<path>/test/lib_R2.fa

How do I run a process foo that executes matching read-lib pair, where the same lib is used for all read pairs? So basically I want to execute foo 4 times:

foo(test/read_b_R1.fa, test/lib_R1.fa)
foo(test/read_b_R2.fa, test/lib_R2.fa)
foo(test/read_a_R1.fa, test/lib_R1.fa)
foo(test/read_a_R2.fa, test/lib_R2.fa)

Solution

  • If you want to use the same library for all read pairs, what you really want is a value channel which can be read an unlimited number of times without being consumed. Note that a value channel is implicitly created by a process when it's invoked with a simple value. This could indeed be a list of files, but it looks like what you want is just one of these to correspond to each of the R1 or R2 reads. I think the simplest solution here is to just include your process using an alias so that you can pass in the required channels/files without too much effort:

    params.reads = 'test/read*_R{1,2}.fa'
    
    include { foo as foo_r1 } from './modules/foo.nf'
    include { foo as foo_r2 } from './modules/foo.nf'
    
    
    workflow {
    
        Channel
            .fromFilePairs( params.reads )
            .multiMap { sample, reads ->
                def (r1, r2) = reads
    
                read1:
                    tuple(sample, r1)
                read2:
                    tuple(sample, r2)
            }
            .set { reads }
    
        lib_r1 = file('test/lib_R1.fa')
        lib_r2 = file('test/lib_R2.fa')
    
        foo_r1(reads.read1, lib_r1)
        foo_r2(reads.read2, lib_r2)
    }
    

    Contents of ./modules/foo.nf:

    process foo {
    
        debug true
    
        input:
        tuple val(sample), path(fasta)
        path(lib)
    
        """
        echo $sample, $fasta, $lib
        """
    }
    

    Results:

    $ nextflow run main.nf 
    N E X T F L O W  ~  version 22.10.0
    Launching `main.nf` [confident_boyd] DSL2 - revision: 8c81e2d743
    executor >  local (6)
    [a8/e8a752] process > foo_r1 (2) [100%] 3 of 3 ✔
    [75/2b32f5] process > foo_r2 (3) [100%] 3 of 3 ✔
    readC, readC_R2.fa, lib_R2.fa
    
    readA, readA_R1.fa, lib_R1.fa
    
    readC, readC_R1.fa, lib_R1.fa
    
    readB, readB_R2.fa, lib_R2.fa
    
    readA, readA_R2.fa, lib_R2.fa
    
    readB, readB_R1.fa, lib_R1.fa