Search code examples
bashsedrename

Remove middle of filenames


I have a list of filenames like this in bash

UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz
UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz

And I want them to look like this

UTSHoS10_R1.fq.gz
UTSHoS10_R2.fq.gz
UTSHoS11_R1.fq.gz 
UTSHoS11_R2.fq.gz
UTSHoS12_R1.fq.gz
UTSHoS12_R2.fq.gz

I do not have the perl rename command and sed 's/_Other*160418./_/' *.gz is not doing anything. I've tried other rename scripts on here but either nothing occurs or my shell starts printing huge amounts of code to the console and freezes.

This post (Removing Middle of Filename) is similar however the answers given do not explain what specific parts of the command are doing so I could not apply it to my problem.


Solution

  • You can do something like this in the directory which contains the files to be renamed:

    for file_name in *.gz
    do 
      new_file_name=$(sed 's/_[^.]*\./_/g' <<< "$file_name");
      mv "$file_name" "$new_file_name";
    done
    

    The pattern (_[^.]*\.) starts matching from the FIRST _ till the FIRST . (both inclusive). [^.]* means 0 or more non-dot (or non-period) characters.

    Example:

    AMD$ ls
    UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz  UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz
    UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz  UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz
    UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
    
    AMD$ for file_name in *.gz
    > do new_file_name=$(sed 's/_[^.]*\./_/g' <<< "$file_name")
    > mv "$file_name" "$new_file_name"
    > done
    
    AMD$ ls
    UTSHoS10_R1.fq.gz  UTSHoS10_R2.fq.gz  UTSHoS11_R2.fq.gz  UTSHoS12_R1.fq.gz  UTSHoS12_R2.fq.gz