I have a bunch of files with mixed IDs in a directory (linux env.) and look like this:
SRR7821874_1.fastq.gz
SRR7821874_2.fastq.gz
SRR7821870_1.fastq.gz
SRR7821870_2.fastq.gz
I also have a 2-column tab-delimited file (called rename.tsv) based on which I try to replace IDs:
Read Sample
SRR7821874 GSM3385663
SRR7821870 GSM3385659
Besides, I would like to concurrently change _1
to _S1_L001_R1_001
and _2
to _S1_L001_R2_001
in the file names, so the final result should look like this:
SRR7821874_1.fastq.gz --> GSM3385663_S1_L001_R1_001.fastq.gz
SRR7821874_2.fastq.gz --> GSM3385663_S1_L001_R2_001.fastq.gz
SRR7821870_1.fastq.gz --> GSM3385659_S1_L001_R1_001.fastq.gz
SRR7821870_2.fastq.gz --> GSM3385659_S1_L001_R2_001.fastq.gz
I've tried the following script with no success as apparently it requires the full file names to rename them (just for ID replacement part):
while read -r Read Sample; do mv "$Read" "$Sample"; done < rename.tsv
You can try:
tail -n+2 rename.tsv | while IFS=$'\t' read -r from to; do
shopt -s nullglob
for f in "${from}_"*.fastq.gz; do
num="${f##*_}"; num="${num%%.*}"
mv "$f" "${to}_S1_L001_R${num}_001.fastq.gz"
done
done
We use tail
to skip the header line, and we enable the nullglob
bash option to expand "${from}_"*.fastq.gz
as the null string instead of the pattern itself if no file matches. As this is part of a pipe the nullglob
option is restored to its previous state at the end.
"${f##*_}"
and "${num%%.*}"
are two of the numerous bash parameter expansions.
Note that you can use a more accurate pattern if needed. For instance, if you know that the number is always 1 or 2 you could replace "${from}_"*.fastq.gz
with "${from}_"[12].fastq.gz
. Or, if it is any one-digit number: "${from}_"[0-9].fastq.gz
.