Search code examples
bashfor-loopbioinformaticscatfastq

Concatenate multiple sets of 2 fastq files in BASH


I'm trying to merge multiple sets of 2 fastq files from the same sequencing library. I have a txt file with all the sample names in it. The samples were sequenced in paired-end so there're both _1.fastq.gz and _2.fastq.gz files associated with each sample.

SRR_Acc_list.txt
SRR1
SRR2
SRR3
SRR4
...

The following code is what I'm trying to achieve: combining SRR1 and SRR2 for both read 1 and read 2 into one fastq files in the output folder combined_fastq.

cat SRA/SRR1_1.fastq.gz SRA/SRR2_1.fastq.gz > combined_fastq/SRR1_1.fastq.gz

cat SRA/SRR1_2.fastq.gz SRA/SRR2_2.fastq.gz > combined_fastq/SRR1_2.fastq.gz

I'm having trouble figuring out how to do this for the rest of the samples. Such as combining SRR3 and SRR4, SRR5 and SRR6 and so forth in a loop.


Solution

  • Like most folk on StackOverflow, I have no idea about bioinformatics, fastq or "paired-ends", however I can reproduce the pattern you seem to want:

    xargs -n2 < SRR_Acc_list.txt |
       while read a b ; do
          for c in 1 2 ; do
             echo $a, $b, $c
          done
       done
    

    Sample Output

    SRR1, SRR2, 1
    SRR1, SRR2, 2
    SRR3, SRR4, 1
    SRR3, SRR4, 2