Search code examples
bioinformaticsnextflow

baseDir issue with nextflow


This might be a very basic question for you guys, however, I am have just started with nextflow and I struggling with the simplest example.

I first explain what I have done and the problem.

Aim: I aim to make a workflow for my bioinformatics analyses as the one here (https://www.nextflow.io/example4.html)

Background: I have installed all the packages that were needed and they all work from the console without any error.

My run: I have used the same script as in example only by replacing the directory names. Here is how I have arranged the directories

location of script

~/raman/nflow/script.nf

location of Fastq files

~/raman/nflow/Data/T4_1.fq.gz
~/raman/nflow/Data/T4_2.fq.gz

Location of transcriptomic file

~/raman/nflow/Genome/trans.fa

The script

#!/usr/bin/env nextflow

/*
 * The following pipeline parameters specify the refence genomes
 * and read pairs and can be provided as command line options
 */
params.reads = "$baseDir/Data/T4_{1,2}.fq.gz"
params.transcriptome = "$baseDir/HumanGenome/SalmonIndex/gencode.v42.transcripts.fa"
params.outdir = "results"

workflow {
    read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )

    INDEX(params.transcriptome)
    FASTQC(read_pairs_ch)
    QUANT(INDEX.out, read_pairs_ch)
}

process INDEX {
    tag "$transcriptome.simpleName"

    input:
    path transcriptome

    output:
    path 'index'

    script:
    """
    salmon index --threads $task.cpus -t $transcriptome -i index
    """
}

process FASTQC {
    tag "FASTQC on $sample_id"
    publishDir params.outdir

    input:
    tuple val(sample_id), path(reads)

    output:
    path "fastqc_${sample_id}_logs"

    script:
    """
    fastqc "$sample_id" "$reads"
    """
}

process QUANT {
    tag "$pair_id"
    publishDir params.outdir

    input:
    path index
    tuple val(pair_id), path(reads)

    output:
    path pair_id

    script:
    """
    salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
    """
}

Output:

(base) ntr@ser:~/raman/nflow$ nextflow script.nf
N E X T F L O W  ~  version 22.10.1
Launching `script.nf` [modest_meninsky] DSL2 - revision: 032a643b56
executor >  local (2)
executor >  local (2)
[-        ] process > INDEX (gencode)       -
[28/02cde5] process > FASTQC (FASTQC on T4) [100%] 1 of 1, failed: 1 ✘
[-        ] process > QUANT                 -
Error executing process > 'FASTQC (FASTQC on T4)'

Caused by:
  Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`

Command executed:

  fastqc "T4" "T4_1.fq.gz T4_2.fq.gz"

Command exit status:
  0

Command output:
  (empty)

Command error:
  Skipping 'T4' which didn't exist, or couldn't be read
  Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read

Work dir:
  /home/ruby/raman/nflow/work/28/02cde5184f4accf9a05bc2ded29c50

Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`

I believe I have an issue with my baseDir understanding. I am assuming that the baseDir is the one where I have my file script.nf I am not sure what is going wrong and how can I fix it.

Could anyone please help or guide.

Thank you


Solution

  • Caused by:
      Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`
    

    Nextflow complains when it can't find the declared output files. This can occur even if the command completes successfully, i.e. with exit status 0. The problem here is that fastqc simply skips files that don't exist or can't be read (e.g. permissions problems), but it does produce these warnings:

    Skipping 'T4' which didn't exist, or couldn't be read
    Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read
    

    The solution is to just make sure all files exist. Note that the fromFilePairs factory method produces a list of files in the second element. Therefore quoting a space-separated pair of filenames is also problematic. All you need is:

    script:
    """
    fastqc ${reads}
    """