Search code examples
nextflow

Bowtie2 error: "reference_index.1.bt2" does not exist or is not a Bowtie2 index in nextflow


I am trying to run Bowtie2 alignment in a Nextflow pipeline, but I'm encountering an error where Bowtie2 can't find the index files. The error message indicates that "reference_index.1.bt2" does not exist or is not a Bowtie 2 index.

Here is my code.

// Define the process for Bowtie2 alignment
process bowtie2 {

    input:
    path bowtie2_out
    path trimmomatic_out

    output:
    path "${params.outputDir8}/out.sam"

    script:
    """
    baseName=\$(basename \$(find ${bowtie2_out} -name '*.bt2' | head -n 1) | sed 's/\\.[0-9]*\\.bt2\$//')
    indexPath="${bowtie2_out}/\${baseName}"
   
    bowtie2 -x \${indexPath} \
            -1 ${trimmomatic_out}/output_1P.fq \
            -2 ${trimmomatic_out}/output_2P.fq \
            -S ${params.outputDir8}/out.sam
    """
}

I am trying to dynamically extract the base name of the Bowtie2 reference index and use it for the -x option in the bowtie2 command. However, the command error i get is:

Command error:
  Index path is: reference_index.1.bt2 reference_index.2.bt2 reference_index.3.bt2 reference_index.4.bt2 reference_index.rev.1.bt2 reference_index.rev.2.bt2/reference_index
  (ERR): "reference_index.1.bt2" does not exist or is not a Bowtie 2 index
  Exiting now ...

The complete error message that I got is:

ERROR ~ Error executing process > 'bowtie2'

Caused by:
  Process `bowtie2` terminated with an error exit status (255)


Command executed:

  baseName=$(basename $(find reference_index.1.bt2 reference_index.2.bt2 reference_index.3.bt2 reference_index.4.bt2 reference_index.rev.1.bt2 reference_index.rev.2.bt2 -name '*.bt2' | head -n 1) | sed 's/\.[0-9]*\.bt2$//')
  indexPath="reference_index.1.bt2 reference_index.2.bt2 reference_index.3.bt2 reference_index.4.bt2 reference_index.rev.1.bt2 reference_index.rev.2.bt2/${baseName}"
  echo "Index path is: ${indexPath}"
  bowtie2 -x ${indexPath}             -1 trimmomatic_out/output_1P.fq             -2 trimmomatic_out/output_2P.fq             -S bowtie2_out/out.sam

Command exit status:
  255

Command output:
  Index path is: reference_index.1.bt2 reference_index.2.bt2 reference_index.3.bt2 reference_index.4.bt2 reference_index.rev.1.bt2 reference_index.rev.2.bt2/reference_index

Command error:
  Index path is: reference_index.1.bt2 reference_index.2.bt2 reference_index.3.bt2 reference_index.4.bt2 reference_index.rev.1.bt2 reference_index.rev.2.bt2/reference_index
  (ERR): "reference_index.1.bt2" does not exist or is not a Bowtie 2 index
  Exiting now ...

Work dir:
  /mnt/d/somil_ILBS/work/work/b3/358c00e746858cc5c99a755b28eba2

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

 -- Check '.nextflow.log' file for details

I have tried various methods to extract the base name from the .bt2 files, but it seems the problem lies in how the path is being constructed.How can I correctly construct the path to the Bowtie2 index file so that Bowtie2 recognizes it and proceeds with the alignment?


Solution

  • From the command executed, it looks like bowtie2_out is expecting a directory, but it is in fact just a list of files. My suggestion would be to modify the upstream genome indexing process to output these into a directory so that they can be readily used downstream, for example:

    process bowtie2_build {
    
        input:
        path fasta
    
        output: 
        path "bowtie2"
    
        script:
        """
        mkdir bowtie2
    
        bowtie2-build \\
            --threads ${task.cpus} \\
            "${fasta}" \\
            "bowtie2/${fasta.baseName}"
        """
    }
    

    Then feed bowtie2_build.out into your bowtie2 process. Your refactored bowtie2 process should look something like:

    process bowtie2 {
    
        publishDir "${params.outputDir}/bowtie2", mode: 'copy'
    
        input:
        tuple val(sample), path(fastq1), path(fastq2)
        path bowtie_index
    
        output:
        tuple val(sample), path("${sample}.sam")
    
        script:
        """
        indexPath=\$(find -L . -name '*.rev.1.bt2' | sed 's/\\.rev.1.bt2\$//')
       
        bowtie2 \\
            -x "\${indexPath}" \\
            -1 "${fastq1}" \\
            -2 "${fastq2}" \\
            -S "${sample}.sam"
        """
    }
    

    Note that you can use the publishDir directive to direct your process outputs to your designated output directory. This ensures files are written to the process working directory (i.e. ./work). The Trimmomatic process should also output a tuple containing the FASTQ files.