I'm doing a test with these files:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Glicose_1_ACTTGA_merge_R2_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R2_001.fastq
I want to get the files that have the same code until the first _ (underscore) and have the code R1 in different output files. The output files should be called according with the code until the first _ (underscore).
-This is my code, but I'm having trouble on making the output files.
#!/bin/bash
for i in {900..995}; do
if [[ ${i} -eq ${i} ]]; then
cat comp${i}_*_R1_001.fastq
fi
done
-I want to have two outputs:
One output will have all lines from:
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and its name should be comp900_R1.out
The other output will have lines from:
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq
and its name should be comp995_R1.out
Finally, as I said, this is a small test. I want my script to work with a lot of files that have the same characteristics.
Using awk
:
ls -1 *.fastq | awk -F_ '$8 == "R1" {system("cat " $0 ">>" $1 "_R1.out")}'
List all files *.fastq
into awk
, splitting on _
. Check if 8:th part $8
is R1
, then append cat >>
the file into first part $1
+ _R1.out
, which will be comp900_R1.out
or comp995_R1.out
. It is assumed that no filenames contain spaces or other special characters.
Result:
File comp900_R1.out
containing all lines from
comp900_c0_seq1_Glicose_1_ACTTGA_merge_R1_001.fastq
comp900_c0_seq2_Glicose_1_ACTTGA_merge_R1_001.fastq
and file comp995_R1.out
containing all lines from
comp995_c0_seq1_Xilano_1_AGTCAA_merge_R1_001.fastq