I have a large directory of files (100+) that I'd like to pass through a program via the terminal.
The files are paired and all follow a naming scheme like such:
TS-8_S53_L001_R1_001.fastq
TS-8_S53_L001_R2_001.fastq
RS-9_S54_L001_R1_001.fastq
RS-9_S54_L001_R2_001.fastq
And the program execution looks like:
Seqprogram -i1 Blah_R1_001.fastq -i2 Blah_R2_001.fastq -o Blah_paired.fastq
All of these files are in one directory.
I'd like to able to run the program on all of the files, using the files paired together in the proper sequence (R1 files are passed through i1, the R1 and R2 files have the same base name) and the output file (-o) is saved under the base name with some identifier attached ("_paired", etc).
I've envisioned on how I'd do this over Python; however, I am trying to get better with BASH.
I'm familiar with how one might call multiple files into a single command; i.e., uncompressing all .gz files in a particular directory
gunzip "*.gz"
But this command has two inputs, and the inputs must be ordered, so the wildcard scheme isn't sufficient.
Thanks
Use a wildcard to get one file of the pair, and then use parameter substitution to get the other corresponding filenames.
for i1 in *_R1_001.fastq; do
i2=${i1/R1_001/R2_001}
paired=${i1/R1_001/paired}
Seqprogram -i1 "$i1" -i2 "$i2" -o "$paired"
done