Search code examples
bashloopsparallel-processingbioinformaticsfastq

combine GNU parallel with nested for loops and multiple variables


I have n folders in destdir. Each folder contains two files: *R1.fastq and *R2.fastq. Using this script, it will do the job (bowtie2) one by one and output {name of the sub folder}.sam in the destdir.

#!/bin/bash

mm9_index="/Users/bowtie2-2.2.6/indexes/mm9/mm9"
destdir=/Users/Desktop/test/outdir/

for f in $destdir/*
do
fbase=$(basename "$f")
echo "Sample $fbase"
bowtie2 -p 4 -x $mm9_index -X 2000 \
-1 "$f"/*R1.fastq \
-2 "$f"/*R2.fastq \
-S $destdir/${fbase}.sam
done

I want to use gnu parallel tool to speed this up, can you help? Thanks.


Solution

  • Use a bash function:

    #!/bin/bash
    
    my_bowtie() {
      mm9_index="/Users/bowtie2-2.2.6/indexes/mm9/mm9"
      destdir=/Users/Desktop/test/outdir/
      f="$1"
      fbase=$(basename "$f")
      echo "Sample $fbase"
      bowtie2 -p 4 -x $mm9_index -X 2000 \
      -1 "$f"/*R1.fastq \
      -2 "$f"/*R2.fastq \
      -S $destdir/${fbase}.sam
    }
    export -f my_bowtie
    parallel my_bowtie ::: $destdir/*
    

    For more details: man parallel or http://www.gnu.org/software/parallel/man.html#EXAMPLE:-Calling-Bash-functions