def barcodes = (1..2).collect { String.format("barcode%02d", it) }
params.orifq = barcodes.collect { "fastq_pass/$it/*.fastq.gz" }
Channel
.fromPath(params.orifq)
.map { it -> [it.name.split("_")[2], it] }
.groupTuple()
.set{orifq_ch}
process cat {
debug true
publishDir = [ path: "Run/orifq", mode: 'copy' ]
input:
tuple val(bc), path(fq)
output:
path("*.fastq.gz")
"""
cat ${fq} > ${bc}.fastq.gz
"""
}
process all_stats {
debug true
publishDir = [ path: "Run/stats", mode: 'copy' ]
input:
path ("*.fastq.gz")
output:
path ("all_stats.txt"), emit: all_stats
"""
seqkit stat *.fastq.gz > all_stats.txt
"""
}
workflow {
cat(orifq_ch)|collect|all_stats|view
{
In this code, process cat generated barcode01.fastq.gz and barcode02.fastq.gz, then all outputs from the precess cat were processed altogether in all_stats.
however, the all_stats.txt result in the file column showed 1.fastq.gz instead of barcode01.fastq.gz and the number 1 seems to be the FIFO serial number not the barcode number.
How to fix the code so the barcode number is correctly assigned?
Nextflow will rewrite input file names when a named pattern is used to declare a collection of files. In this case, the named pattern provided is "*.fastq.gz"
. Note that the *
wildcard is used to control the names of staged files. Otherwise (from multiple input files):
When the input has a fixed file name and a collection of files is received by the process, the file name will be appended with a numerical suffix representing its ordinal position in the list.
However, the rewriting of input file names is completely optional. Instead, you can just use a regular variable to bind the collection of files. This can then be used accordingly in your process script, for example (untested):
params.reads = './fastq_pass/barcode{0[8-9],[1-5][0-9],6[0-4]}/*.fastq.gz'
params.outdir = './results'
process cat {
publishDir "${params.outdir}/orifq", mode: 'copy'
input:
tuple val(bc), path(fq)
output:
path "${bc}.fastq.gz"
"""
cat ${fq} > "${bc}.fastq.gz"
"""
}
process all_stats {
publishDir "${params.outdir}/stats", mode: 'copy'
input:
path fastq_files
output:
path "all_stats.txt"
"""
seqkit stat ${fastq_files} > all_stats.txt
"""
}
workflow {
Channel.fromPath( params.reads )
.map { it -> [it.name.split("_")[2], it] }
.groupTuple()
.set { orifq_ch }
...
}
The reads pattern above will match barcodes 08 to 64 inclusive. It requires breaking the range down into multiple patterns and uses curly braces for each part.