Search code examples
bioinformaticsnextflow

Combining output from multiple Nextflow processes into another process. Nextflow DSL2. FastQC and MultiQC


Goal: Take input and process it through:

  1. initial FastQC
  2. trimming with Sickle
  3. post-trimming FastQC
  4. process both initial and post-trimming FastQC outputs with MultiQC

Input: sample 1: sample1_R1_.fastq sample1_R2_.fastq sample 2: sample2_R1_.fastq sample2_R2_.fastq

Processes: each process contains a:

publishDir "${params.outdir}/<name>", mode: "copy"

Where the is either, "sickle", "fastqc", or "multiqc".

Output I am getting now:

  • fastqc directory: sample2 trimmed files (R1 and R2) and sample1 raw files (R1 and R2). This is - missing the sample2 raw files and sample1 trimmed files.
  • sickle directory: both sample1 and sample2 trimmed files (this is the desired output)
  • multiqc directory: this has an HTML file which contains results for the files within the fastqc directory.
  • missing the sample2 raw files and sample1 trimmed files.

I have the following code:

workflow {

    
    SICKLE( reads )
    fastqc_ch = FASTQC(reads, threads)
    
    sickle_fastqc_ch = SICKLE_FASTQC ( SICKLE.out.reads_trimmed , threads )
   
    fastqc_output = fastqc_ch.collect()
    sickle_fastqc_output = sickle_fastqc_ch.collect()
   
    combined_output = fastqc_output.merge(sickle_fastqc_output)
 
    MULTIQC( combined_output )
}

Need help

  • I thought by collecting both the raw fastqc output and the trimmed fastqc output and then merging these outputs before sending them to the MULTIQC() process would work.
  • Any thoughts?
  • THANKS!

Solution

  • I think the trick is to combine the FastQC and Sickle log files prior to calling collect. You can use the mix operator for this, for example using Conda:

    Contents of main.nf:

    params.reads = '/path/to/fastqs/*_R{1,2}.fastq.gz'
    params.multiqc_config = './assets/multiqc_config.yaml'
    
    include { FASTQC as FASTQC_RAW } from './modules/fastqc'
    include { FASTQC as FASTQC_TRIMMED } from './modules/fastqc'
    include { SICKLE_PE } from './modules/sickle'
    include { MULTIQC } from './modules/multiqc'
    
    
    workflow {
    
        reads = Channel.fromFilePairs( params.reads )
    
        multiqc_config = file( params.multiqc_config )
    
        FASTQC_RAW( reads )
        SICKLE_PE( reads )
        FASTQC_TRIMMED( SICKLE_PE.out.trimmed )
    
        Channel.empty()
            .mix( FASTQC_RAW.out )
            .mix( SICKLE_PE.out.log )
            .mix( FASTQC_TRIMMED.out )
            .map { sample, files -> files }
            .collect()
            .set { log_files }
    
        MULTIQC( log_files, multiqc_config )
    }
    

    Contents of ./modules/fastqc/main.nf:

    process FASTQC {
    
        tag { sample }
    
        input:
        tuple val(sample), path(reads)
    
        output:
        tuple val(sample), path("*_fastqc.{zip,html}")
    
        """
        fastqc -q ${reads}
        """
    }
    

    Contents of ./modules/sickle/main.nf:

    process SICKLE_PE {
    
        tag { sample }
    
        input:
        tuple val(sample), path(reads, stageAs: 'reads/*')
    
        output:
        tuple val(sample), path("*.trimmed.fastq.gz"), emit: trimmed
        tuple val(sample), path("${sample}.singles.fastq.gz"), emit: singles
        tuple val(sample), path("${sample}.log"), emit: log
    
        script:
        def (fq1, fq2) = reads
    
        """
        sickle pe \\
            -t sanger \\
            -g \\
            -f "${fq1}" \\
            -r "${fq2}" \\
            -o "${sample}_R1.trimmed.fastq.gz" \\
            -p "${sample}_R2.trimmed.fastq.gz" \\
            -s "${sample}.singles.fastq.gz" \\
            1> "${sample}.log"
        """
    }
    

    Contents of ./modules/multiqc/main.nf:

    process MULTIQC {
    
        input:
        path 'logs/*'
        path config
    
        output:
        path "multiqc_report.html", emit: html
        path "multiqc_data", emit: data
    
        """
        multiqc \\
            --config "${config}" \\
            .
        """
    }
    

    Contents of ./nextflow.config:

    params {
    
        outdir = './results'
    }
    
    process {
    
        withName: FASTQC {
    
            publishDir = [
                path: "${params.outdir}/fastqc",
                mode: 'copy',
            ]
    
            cpus = 1
            conda = 'fastqc=0.12.1'
        }
    
        withName: SICKLE_PE {
    
            publishDir = [
                path: "${params.outdir}/sickle",
                mode: 'copy',
            ]
    
            cpus = 1
            conda = 'sickle-trim=1.33'
        }
    
        withName: MULTIQC {
    
            publishDir = [
                path: "${params.outdir}/multiqc",
                mode: 'copy',
            ]
    
            cpus = 1
            conda = 'multiqc=1.14'
        }
    }
    
    conda {
    
        enabled = true
    }
    

    Contents of ./assets/multiqc_config.yaml:

    module_order:
        - fastqc:
            name: 'FastQC (raw)'
            anchor: 'fastqc-raw'
            target: 'FastQC'
            path_filters_exclude:
                - './logs/*.trimmed_fastqc.zip'
        - sickle
        - fastqc:
            name: 'FastQC (trimmed)'
            anchor: 'fastqc-trimmed'
            target: 'FastQC'
            path_filters:
                - './logs/*.trimmed_fastqc.zip'
    
    run_modules:
        - fastqc
        - sickle
    
    plots_force_interactive: True
    
    show_analysis_time: False
    show_analysis_paths: False
    

    Results:

    $ nextflow run main.nf -ansi-log false
    N E X T F L O W  ~  version 23.04.1
    Launching `main.nf` [distraught_euler] DSL2 - revision: 971e2c9d1f
    Creating env using conda: fastqc=0.12.1 [cache /path/to/work/conda/env-d3b12ea84164cc521e82b56dc7f119d9]
    Creating env using conda: sickle-trim=1.33 [cache /path/to/work/conda/env-72d5fea3bee2c2c7bb1951c0356c97fa]
    [d2/302df1] Submitted process > SICKLE_PE (sample2)
    [11/13a1f3] Submitted process > SICKLE_PE (sample1)
    [ce/f8d7b9] Submitted process > SICKLE_PE (sample3)
    [6a/0588fc] Submitted process > FASTQC_RAW (sample3)
    [3a/deabf3] Submitted process > FASTQC_RAW (sample1)
    [95/e2ddb3] Submitted process > FASTQC_RAW (sample2)
    [dd/39b166] Submitted process > FASTQC_TRIMMED (sample2)
    [45/bdefdc] Submitted process > FASTQC_TRIMMED (sample3)
    [21/c15ebb] Submitted process > FASTQC_TRIMMED (sample1)
    Creating env using conda: multiqc=1.14 [cache /path/to/work/conda/env-39798d385be8fa0f1dce9354302302f0]
    [4b/45310d] Submitted process > MULTIQC