So I have gene files named 1, 2, ... 19500.fa and want to sort them into folders 200, 400, 600... 19600 for a downstream pipeline. I have an idea of how to do this but it's pretty gruesome:
for file in "${files[@]}"; do
base_name=$(basename "$file")
gene_number=$(echo "$base_name" | cut -d'_' -f2 | cut -d'.' -f1)
to_path= (path to folder containing 200, 400, ... 19600 folders)
#if it's gene_200.fa, 400.fa etc. copy into that dir
if (( $gene_number%200 == 0)); then
cp file $to_path/$gene_number/$file
elif (( $gene_number < 200 )); then
cp file $to_path/200/$file
elif (( $gene_number > 19400)); then
cp file $to_path/19600/$file
# the endless pain of 200-400, 400-600, 600-800 ... 19200-19400
elif (( $gene_number > 200 && $gene_number < 400)); then
cp file $to_path/19600/$file
elif ....
My question is then: is there a less tedious way to do this without copying any one file into multiple folders? (e.g. if i only sorted by gene number < file name a file named gene_3.fa would be copied into all folders)
You could do this, just change the for
to loop over the files, change the delta
value to 200
and add the cp
or mv
as you like:
#!/usr/bin/env bash
for file in gene_{1..20}.fa; do
if [[ "$file" =~ [0-9]+ ]]; then
bucket=$(( ((gene_number / delta) * delta) + delta ))
echo "$file -> $bucket"
$ ./
gene_1.fa -> 5
gene_2.fa -> 5
gene_3.fa -> 5
gene_4.fa -> 5
gene_5.fa -> 10
gene_6.fa -> 10
gene_7.fa -> 10
gene_8.fa -> 10
gene_9.fa -> 10
gene_10.fa -> 15
gene_11.fa -> 15
gene_12.fa -> 15
gene_13.fa -> 15
gene_14.fa -> 15
gene_15.fa -> 20
gene_16.fa -> 20
gene_17.fa -> 20
gene_18.fa -> 20
gene_19.fa -> 20
gene_20.fa -> 25
The math works because bash does integer arithmetic, not floating point, and so the part after the decimal point after the division will be truncated.