Search code examples
bashbatch-rename

How to rename multiple files reordering name parts and including ordered numbers at the front (bash)


I have a list of files in a folder named based on (a) sample name [sometimes with '_1' or '_2' for different individuals]; (b) job id [1-12]; and (c) chromosome number [chrI-chrXXI].

Ex:

    8116_1_chrI.vcf  #sample[8116]; jobId[1]; chr[chrI]
    8116_1_chrII.vcf  #sample[8116]; jobId[1]; chr[chrII]
    ...
    CSC0832_1_7_chrVIII.vcf  #sample[CSC0832_1]; jobId[7]; chr[chrVIII]
    CSC0832_1_7_chrXIX.vcf  #sample[CSC0832_1]; jobId[7]; chr[chrXIX]
    ...
    RNF2887_2_12_chrX.vcf  #sample[RNF2887_2]; jobId[2]; chr[chrX]
    RNF2887_2_12_chrXI.vcf  #sample[RNF2887_2]; jobId[2]; chr[chrXI]
    ...

Each sample has the same job id number, and separate files for each chromosome. I am trying to submit a job array, so now I need unique identifiers (job ids) for each single file, and I am trying to rename those by (1) including a unique number in the front; (2) then adding the sample id; (3) and then the chromosome number.

I am trying to do a bash for loop for that, but it is not working. Below is my script:

    for FILENAME in `ls $SCRATCH/stickleback/sorelData/indSamplesVcf/splitChr/*.vcf`; do
    ROOTNAME=`basename ${FILENAME%%_*}`
    CHR=`basename ${FILENAME##*_} .vcf`
    for LIST in `seq 279`; do
    cp "$FILENAME" $SCRATCH/stickleback/sorelData/indSamplesVcf/splitCopy/${LIST}_${ROOTNAME}_${CHR}.vcf
    echo "copying $(basename ${FILENAME}) to ${LIST}_${ROOTNAME}_${CHR}.vcf"
    done
    done

What I get is a file with unique numbers, but they are always the same sample id, and the same chromosome number:

    1_8116_chrIII.vcf
    2_8116_chrIII.vcf
    ...

And one thing I noticed is that when I echo basename ${FILENAME##*_}, it lists the chromosomes in alphabetical order (because they are in romans). Will that affect the renaming also?

Sorry for the long and silly question, but I am a newbie at this.

Thank you!


Solution

  • if it helps ...

    directory=$SCRATCH/stickleback/sorelData/indSamplesVcf/splitChr
    
    list=0
    for filename in $directory/*.vcf ; do
        basename=$( basename ${filename} )  # 8116_1_chrI.vcf
        sample=${basename%%_*}              # 8116
        chr=${basename##*_}                 # chrI.vcf
        list=$(( list+1 ))
        cp "$directory/$filename" "$directory/splitCopy/${list}_${sample}_${chr}"
        echo "copying $basename to ${list}_${sample}_${chr}"
    done
    

    I assume:

    • that you want to add an unique ID ($list) to filename
    • filename becomes id_sample_chr.extension

    I suggest:

    • variables of script in lowercase
    • prefer $( command ), no backticks
    • do not use $( ls )
    • use indentations