Search code examples
bashubuntugnugnu-parallel

GNU parallel is merging the ouput for multiple files into one. Why?


I am running GNU parallel. Unlike the output in my other analyses the output for this one is weird.

My code:

# set the path of the required program
samtools=/usr/local/apps/samtools/0.1.19-gcc412/samtools
TempDir=/gpfs_common/share03/uncg/bkgiri/apps/temp_files/


# run the process for 4 samples in 4 different cores
parallel --tmpdir ${TempDir} --jobs 4 ${samtools} view -b -q 40 realigned_{}.bam -L DNA_Samples.Passed_Variants.Final.bed > realigned_{}Filtered.bam ::: ms01e ms02g ms03g ms04h
  • I was expecting 4 different output files for each input, each named as realigned_ms01eFiltered.bam, realigned_ms02gFiltered.bam etc.
  • But, I am getting a one large file named as, realigned_{}Filtered.bam. I never encountered this problem before with other tools.

I also tried doing:

parallel --tmpdir ${TempDir} --jobs 4 '${samtools} view -b -q 40 realigned_{}.bam -L DNA_Samples.Passed_Variants.Final.bed > realigned_{}Filtered.bam' ::: ms01e ms02g ms03g ms04h

# which now gives me another type of error

Any suggestions ?


Solution

  • As, mentioned by @choroba: > is interpreted as redirection by the shell even before parallel can see it.

    So, I found two way of working out this problem at the end.

    1. Method A: We can either interpret the whole command within " " which I think is functionally more efficient.

      parallel --tmpdir ${TempDir} --jobs 4 "${samtools} view -b -q 40 realigned_{}.bam -L DNA_Samples.Passed_Variants.Final.bed > realigned_{}Filtered.bam" ::: ms01e ms02g ms03g ms04h
      
    2. Method B: Or, we can interpret the output within " ". This allows > to be interpreted as text which when pipedin as stdin work as output rather than redirection.

      parallel --tmpdir ${TempDir} --jobs 4 ${samtools} view -b -q 40 realigned_{}.bam -L DNA_Samples.Passed_Variants.Final.bed ">" realigned_{}Filtered.bam ::: ms01e ms02g ms03g ms04h
      

    I tested both method and both methods give me exactly the same result. So, either one is safe to call.

    Thanks,