Search code examples
bashunixfastq

executing a command on pairedf iles and then renaming the outputs


command.py, that merges together two paired files, CA01_S1_R1.fastqand CA01_S1_R2.fastq. It then prints the result into a new directory paired.out and names the resulting file paired.fastq. The full command would read

command.py -f CA01_S1_R1.fastq -r CA01_S1_R2.fastq -o paired.out

However, I would like to execute this command on many files, and then have all of the outputs saved into the same directory. Furthermore, the outputs need to have unique names. So, I want to send files 2 and 3 as well, effectively running these commands also:

command.py -f CA02_S2_R1.fastq -r CA02_S2_R2.fastq -o paired.out

command.py -f CA03_S3_R1.fastq -r CA03_S3_R2.fastq -o paired.out

However, even if I had code to loop this command on all the samples, the command would keep overwriting the output of the last pairing, as all outputs are saved inside the folder paired.out, with the filename paired.fastq. Is there a simple loop I can write that will send each file pair through the command, then enter the folder and rename the file output from paired.fastq to CA01_paired.fastq, and then repeat for all my files?

I know that I can send multiple files through the command using:

for f in CA*_S*_R1.fastq; do
# Replace R1 with R2 in the filename and run the command on both files.
command.py -f "$f" -r "${f/R1/R2}" -o paired.ends
done; unset -v f

I'd like to add a second instruction to this loop to just cd into this folder, and rename the file, incrementing by 1 each time. I don't know how to set the increment variable. I imagine it would look something like this:

for f in CA*_S*_R1.fastq; do
# Replace R1 with R2 in the filename and run the command on both files.
command.py -f "$f" -r "${f/R1/R2}" -o paired.ends
#cd into the output folder
cd paired.ends
#create an environmental variable that keep tracks of which file number I am on
g=01
#rename the output file
mv fastqjoin.join.fastq CA$g_fastqjoin.join.fastq
#update the environmental variable that keeps track of which file number I am on
g= g + 1
#cd out of the folder where the outputs are being stored and back to the folder that contains all the files to be paired.
cd ..
done; unset -v f

Solution

  • Assuming the files are paired via blah_R1.fastq and blah_R2.fastq:

    for f in *_R1.fastq; do
        r=${f/_R1/_R2}
        command.py -f "$f" -r "$r" -o paired.out &&
            mv paired.out/paired.fastq paired.out/"${f%%_*}_paired.fastq"
    done