Search code examples
groovybioinformaticsnextflow

Specifying Groovy Transpose Output


I'm Currently Building a processing pipeline for cDNA and a process in my pipeline outputs 7 differing fastq files in an array with 7 id items that are meta data, I need it to be formatted in a manner that the Id is associated with the fastq file that has the same ID and currently I'm getting the Id's paired with the fastq files in the order they were produced by the previous step.

The Channel in question prior to using the transpose function looks like:

[
 [
  [id:L5ad_T1, single_end:true], 
  [id:L5Cd_T1, single_end:true], 
  [id:L5Ac_T1, single_end:true], 
  [id:L5Cc_T1, single_end:true], 
  [id:L5Ab_T1, single_end:true], 
  [id:L5Aa_T1, single_end:true], 
  [id:L5Ca_T1, single_end:true]
 ], 
 [/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq, 
  /flexbar_trimmed_NNNATTAGC_L5Ab.fastq, 
  /flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq, 
  /flexbar_trimmed_NNNCTAGC_L5Ca.fastq, 
  /flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq, 
  /flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq, 
  /flexbar_trimmed_NNNTAAGC_L5Aa.fastq
]
]

Prior to using the Transpose Function on the data in the channel:


[[id:L5ad_T1, single_end:true], /flexbar_trimmed_NNNACTCAGC_L5Cc.fastq][[id:L5Cd_T1, single_end:true], /flexbar_trimmed_NNNATTAGC_L5Ab.fastq][[id:L5Ac_T1, single_end:true], /flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq][[id:L5Cc_T1, single_end:true], /flexbar_trimmed_NNNCTAGC_L5Ca.fastq]
[[id:L5Ab_T1, single_end:true], /flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq]
[[id:L5Aa_T1, single_end:true], /flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq]
[[id:L5Ca_T1, single_end:true], /flexbar_trimmed_NNNTAAGC_L5Aa.fastq]

While this is the correct format the ID meta is now associated with the incorrect fastq file, the ideal outcome would be for instance for the first pair:

[[id:L5ad_T1, single_end:true], /flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq]

Is there any way to go about associating the correct ID to the correct file?


Solution

  • One way would be to create a map of the metadata and a map of the FASTQ files, where each map shares identical keys. We can then just loop through one of the maps and lookup the key's value in the other. The flatMap operator can be used to flatten the output so that each item is emitted separately. For example:

    ch = Channel.of(
        [
            [
                [id:'L5ad_T1', single_end:true], 
                [id:'L5Cd_T1', single_end:true], 
                [id:'L5Ac_T1', single_end:true], 
                [id:'L5Cc_T1', single_end:true], 
                [id:'L5Ab_T1', single_end:true], 
                [id:'L5Aa_T1', single_end:true], 
                [id:'L5Ca_T1', single_end:true]
            ],
            [
                file('/dir/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq'),
                file('/dir/flexbar_trimmed_NNNATTAGC_L5Ab.fastq'),
                file('/dir/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq'),
                file('/dir/flexbar_trimmed_NNNCTAGC_L5Ca.fastq'),
                file('/dir/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq'),
                file('/dir/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq'),
                file('/dir/flexbar_trimmed_NNNTAAGC_L5Aa.fastq')
            ]
        ]
    )
    
    workflow {
    
        ch.flatMap { meta_list, fastq_list -> 
            def meta_map = meta_list.collectEntries { meta ->
                [ meta.id.split('_').first(), meta ]
            }
            def fastq_map = fastq_list.collectEntries { fastq ->
                [ fastq.simpleName.split('_').last(), fastq ]
            }
    
            meta_map.collect { k, v -> [v, fastq_map[k]] }
        }
        .view()
    }
    
    

    Results:

    $ nextflow run main.nf 
    N E X T F L O W  ~  version 23.04.1
    Launching `main.nf` [happy_waddington] DSL2 - revision: 345d777205
    [[id:L5ad_T1, single_end:true], /dir/flexbar_trimmed_NNNCGCTTAGC_L5ad.fastq]
    [[id:L5Cd_T1, single_end:true], /dir/flexbar_trimmed_NNNGACTTAGC_L5Cd.fastq]
    [[id:L5Ac_T1, single_end:true], /dir/flexbar_trimmed_NNNGCGCAGC_L5Ac.fastq]
    [[id:L5Cc_T1, single_end:true], /dir/flexbar_trimmed_NNNACTCAGC_L5Cc.fastq]
    [[id:L5Ab_T1, single_end:true], /dir/flexbar_trimmed_NNNATTAGC_L5Ab.fastq]
    [[id:L5Aa_T1, single_end:true], /dir/flexbar_trimmed_NNNTAAGC_L5Aa.fastq]
    [[id:L5Ca_T1, single_end:true], /dir/flexbar_trimmed_NNNCTAGC_L5Ca.fastq]