In my experience, FASTQ files can get quite large. Without knowing too much of the specifics, my recommendation would be to move the concatenation (and renaming) to a separate process. In this way, all of the 'work' can be done inside Nextflow's working directory. Here's a solution that uses the new DSL 2. It uses the splitCsv operator to parse the metadata and identify the FASTQ files. The collection can then be passed into our 'concat_reads' process. To handle optionally gzipped files, you could try the following:
params.metadata = './metadata.csv'
params.outdir = './results'
process concat_reads {
tag { sample_name }
publishDir "${params.outdir}/concat_reads", mode: 'copy'
tuple val(sample_name), path(fastq_files)
tuple val(sample_name), path("${sample_name}.${extn}")
if( fastq_files.every {'.fastq.gz') } )
extn = 'fastq.gz'
else if( fastq_files.every {'.fastq') } )
extn = 'fastq'
error "Concatentation of mixed filetypes is unsupported"
cat ${fastq_files} > "${sample_name}.${extn}"
process pomoxis {
tag { sample_name }
publishDir "${params.outdir}/pomoxis", mode: 'copy'
cpus 18
tuple val(sample_name), path(fastq)
mini_assemble \\
-t ${task.cpus} \\
-i "${fastq}" \\
-o results \\
-p "${sample_name}"
workflow {
fastq_extns = [ '.fastq', '.fastq.gz' ]
Channel.fromPath( params.metadata )
| splitCsv()
| map { dir, sample_name ->
all_files = file(dir).listFiles()
fastq_files = all_files.findAll { fn ->
fastq_extns.find { it ) }
tuple( sample_name, fastq_files )
| concat_reads
| pomoxis