Search code examples
groovymappingchannelnextflowillumina

Nextflow: channel.fromFilePairs() Map Post Processing


I have a question about channel.fromFilePairs() . I have the following nextflow script:

params.reads = "/path/to/my_reads/sample03_L001_R{1,2}_001.fastq.gz"

my_reads_ch = channel.fromFilePairs(params.reads)

println "reads: $my_reads_ch"

The script prints [sample03_L001_R, [/path/to/my_reads/sample03_L001_R1_001.fastq.gz, /path/to/my_reads/sample03_L001_R2_001.fastq.gz]].

The desired output is [sample03, [/path/to/my_reads/sample03_L001_R1_001.fastq.gz, /path/to/my_reads/sample03_L001_R2_001.fastq.gz]].

How do I remove the "_L001_R"?

I've tried

channel.fromFilePairs(params.reads).map{it[0] - /_\w+/, it[1]}

That gives me an ERROR: Unknown method invocation 'negative' on Pattern type.

Any suggestions? Many thanks


Solution

  • You just need to use the tilde operator to first create a pattern object:

    Channel
        .fromFilePairs( params.reads )
        .map { sample, reads -> tuple( sample - ~/_\w+$/, reads ) }
        .view()
    

    A better way though would be to better define your initial glob pattern, and let the fromFilePairs operator strip off the suffix for you. For example:

    params.reads = "/path/to/my_reads/*_L001_R{1,2}_001.fastq.gz"
    
    Channel
        .fromFilePairs( params.reads )
        .view()
    

    Results:

    $ nextflow run main.nf 
    
     N E X T F L O W   ~  version 24.04.3
    
    Launching `main.nf` [golden_wescoff] DSL2 - revision: f7979e483d
    
    [sample01, [/path/to/my_reads/sample01_L001_R1_001.fastq.gz, /path/to/my_reads/sample01_L001_R2_001.fastq.gz]]
    [sample03, [/path/to/my_reads/sample03_L001_R1_001.fastq.gz, /path/to/my_reads/sample03_L001_R2_001.fastq.gz]]
    [sample02, [/path/to/my_reads/sample02_L001_R1_001.fastq.gz, /path/to/my_reads/sample02_L001_R2_001.fastq.gz]]