Search code examples
bashdirectory-structure

How to sort and copy numbered files into incremented folders


So I have gene files named 1, 2, ... 19500.fa and want to sort them into folders 200, 400, 600... 19600 for a downstream pipeline. I have an idea of how to do this but it's pretty gruesome:

for file in "${files[@]}"; do

    base_name=$(basename "$file")
    gene_number=$(echo "$base_name" | cut -d'_' -f2 | cut -d'.' -f1)
    to_path= (path to folder containing 200, 400, ... 19600 folders)
    
    #if it's gene_200.fa, 400.fa etc. copy into that dir
    if (( $gene_number%200 == 0)); then 
        cp file $to_path/$gene_number/$file
    elif (( $gene_number < 200 )); then 
        cp file $to_path/200/$file
    elif (( $gene_number > 19400)); then 
        cp file $to_path/19600/$file
    # the endless pain of 200-400, 400-600, 600-800 ... 19200-19400
    elif (( $gene_number > 200 && $gene_number < 400)); then 
        cp file $to_path/19600/$file
    elif ....

My question is then: is there a less tedious way to do this without copying any one file into multiple folders? (e.g. if i only sorted by gene number < file name a file named gene_3.fa would be copied into all folders)


Solution

  • You could do this, just change the for to loop over the files, change the delta value to 200 and add the cp or mv as you like:

    #!/usr/bin/env bash
    
    delta=5
    for file in gene_{1..20}.fa; do
        if [[ "$file" =~ [0-9]+ ]]; then
            gene_number="${BASH_REMATCH[0]}"
            bucket=$(( ((gene_number / delta) * delta) + delta ))
            echo "$file -> $bucket"
        fi
    done
    

    $ ./tst.sh
    gene_1.fa -> 5
    gene_2.fa -> 5
    gene_3.fa -> 5
    gene_4.fa -> 5
    gene_5.fa -> 10
    gene_6.fa -> 10
    gene_7.fa -> 10
    gene_8.fa -> 10
    gene_9.fa -> 10
    gene_10.fa -> 15
    gene_11.fa -> 15
    gene_12.fa -> 15
    gene_13.fa -> 15
    gene_14.fa -> 15
    gene_15.fa -> 20
    gene_16.fa -> 20
    gene_17.fa -> 20
    gene_18.fa -> 20
    gene_19.fa -> 20
    gene_20.fa -> 25
    

    The math works because bash does integer arithmetic, not floating point, and so the part after the decimal point after the division will be truncated.