Nextflow rename barcodes and concatenate reads within barcodes

My current working directory has the following sub-directories

My Bash script

I have compiled the above Bash script to do the following tasks:

  • rename the sub-directories (barcode01-12) taking information from the metadata.csv
  • concatenate the individual reads within a sub-directory and move them up in the $PWD
  • then I use these concatenated reads (one per barcode) for my Nextflow script below:


How can I get the above pre-processing tasks (renaming and concatenating) or the Bash script added at the beginning of my following Nextflow script?

  • In my experience, FASTQ files can get quite large. Without knowing too much of the specifics, my recommendation would be to move the concatenation (and renaming) to a separate process. In this way, all of the 'work' can be done inside Nextflow's working directory. Here's a solution that uses the new DSL 2. It uses the splitCsv operator to parse the metadata and identify the FASTQ files. The collection can then be passed into our 'concat_reads' process. To handle optionally gzipped files, you could try the following:

    params.metadata = './metadata.csv'
    params.outdir = './results'
    process concat_reads {
        tag { sample_name }
        publishDir "${params.outdir}/concat_reads", mode: 'copy'
        tuple val(sample_name), path(fastq_files)
        tuple val(sample_name), path("${sample_name}.${extn}")
        if( fastq_files.every {'.fastq.gz') } )
            extn = 'fastq.gz'
        else if( fastq_files.every {'.fastq') } )
            extn = 'fastq'
            error "Concatentation of mixed filetypes is unsupported"
        cat ${fastq_files} > "${sample_name}.${extn}"
    process pomoxis {
        tag { sample_name }
        publishDir "${params.outdir}/pomoxis", mode: 'copy'
        cpus 18
        tuple val(sample_name), path(fastq)
        mini_assemble \\
            -t ${task.cpus} \\
            -i "${fastq}" \\
            -o results \\
            -p "${sample_name}"
    workflow {
        fastq_extns = [ '.fastq', '.fastq.gz' ]
        Channel.fromPath( params.metadata )
            | splitCsv()
            | map { dir, sample_name ->
                all_files = file(dir).listFiles()
                fastq_files = all_files.findAll { fn ->
                    fastq_extns.find { it ) }
                tuple( sample_name, fastq_files )
            | concat_reads
            | pomoxis