My Nextflow workflow is intended for multiple samples, and begins by reading a CSV of sample specific parameters, where one of the entry columns will have 1 or more paths listed, separated by comas (the whole cell is quoted so the values don't split). I want to be able to pass in multiple directories and have all of them staged in the process, but the number of directories may not be the same for each sample.
My csv structure looks like this:
ID | Sample | Paths |
---|---|---|
A1 | B1 | "path1,path2,path3" |
A2 | B2 | "path1" |
And the channel to read it looks like
Channel.fromPath(params.samples)
.splitCsv(quote:'\"', header:true)
.set { samples }
Is there a way to convert the third column Paths
into a list so I can pass it into a process as a multiple paths? Currently, when I pass this in as a path, it only seems to keep the first value before the comma, such that for sample B1, the Paths
value is just "path1"
UPDATE
I figured out how to access the column information using
workflow {
Channel.fromPath(params.samples)
.splitCsv(quote:'\"', header:true)
.map {col -> tuple(
"${col.Sample}",
"${col.ID}",
"${col.Paths}".split(',')
)
}
.set { samples }
samples.view()
and I was able to split my string into a list, but now it is not allowing me to specify that as a path in the process, though I thought it would work similar to the input for fromFilePairs
where you can pass in a list of file paths. Still need to figure out how to pass a list of files in as input
You can use Groovy's collect method and a closure to transform each entry in the collection into a file object. The transformed collection can then be passed in as input using the path
qualifier (or using the file
qualifier if you're using an older Nextflow version):
The normal file input constructs introduced in the Input of files section are valid for collections of multiple files as well.
params.samples = './samples.csv'
process example {
debug true
input:
tuple val(sample), val(id), path(file_dirs)
"""
echo "${sample}:${id} dirs:"
ls -1d ${file_dirs}
"""
}
workflow {
Channel
.fromPath( params.samples )
.splitCsv( quote:'\"', header:true )
.map { row ->
def file_dirs = row.Paths.split(',').collect { file(it) }
tuple( row.Sample, row.ID, file_dirs )
}
.set { samples }
example( samples )
}
Results:
$ nextflow run main.nf
N E X T F L O W ~ version 22.04.4
Launching `main.nf` [disturbed_lorenz] DSL2 - revision: 31bed8176a
executor > local (2)
[c6/2e9058] process > example (1) [100%] 2 of 2 ✔
B2:A2 dirs:
path1
B1:A1 dirs:
path1
path2
path3