Search code examples
groovynextflow

Access and modify single element of tuple in Nextflow/Groovy Channel


My Nextflow workflow is intended for multiple samples, and begins by reading a CSV of sample specific parameters, where one of the entry columns will have 1 or more paths listed, separated by comas (the whole cell is quoted so the values don't split). I want to be able to pass in multiple directories and have all of them staged in the process, but the number of directories may not be the same for each sample.

My csv structure looks like this:

ID Sample Paths
A1 B1 "path1,path2,path3"
A2 B2 "path1"

And the channel to read it looks like

Channel.fromPath(params.samples)
        .splitCsv(quote:'\"', header:true)
        .set { samples } 

Is there a way to convert the third column Paths into a list so I can pass it into a process as a multiple paths? Currently, when I pass this in as a path, it only seems to keep the first value before the comma, such that for sample B1, the Paths value is just "path1"

UPDATE

I figured out how to access the column information using

workflow {
    Channel.fromPath(params.samples)
        .splitCsv(quote:'\"', header:true)
        .map {col -> tuple(
            "${col.Sample}",
            "${col.ID}",
            "${col.Paths}".split(',') 
        )
        }
        .set { samples } 
    samples.view()

and I was able to split my string into a list, but now it is not allowing me to specify that as a path in the process, though I thought it would work similar to the input for fromFilePairs where you can pass in a list of file paths. Still need to figure out how to pass a list of files in as input


Solution

  • You can use Groovy's collect method and a closure to transform each entry in the collection into a file object. The transformed collection can then be passed in as input using the path qualifier (or using the file qualifier if you're using an older Nextflow version):

    The normal file input constructs introduced in the Input of files section are valid for collections of multiple files as well.

    params.samples = './samples.csv'
    
    
    process example {
    
        debug true
    
        input:
        tuple val(sample), val(id), path(file_dirs)
    
        """
        echo "${sample}:${id} dirs:"
        ls -1d ${file_dirs}
        """
    }
    
    workflow {
    
        Channel
            .fromPath( params.samples )
            .splitCsv( quote:'\"', header:true )
            .map { row ->
                def file_dirs = row.Paths.split(',').collect { file(it) } 
    
                tuple( row.Sample, row.ID, file_dirs )
            }
            .set { samples }
    
        example( samples )
    }
    

    Results:

    $ nextflow run main.nf 
    N E X T F L O W  ~  version 22.04.4
    Launching `main.nf` [disturbed_lorenz] DSL2 - revision: 31bed8176a
    executor >  local (2)
    [c6/2e9058] process > example (1) [100%] 2 of 2 ✔
    B2:A2 dirs:
    path1
    
    B1:A1 dirs:
    path1
    path2
    path3