Search code examples
bashshellsedrenamemv

Batch script to remove parts of a filename within variable characters


Help please!

I have a set of 60 files named in the following format:

  • XXXXX_L2_R1_001_XneCgnfdkjTTTnm.fastq.gz
  • XXXXX_L2_R2_001_GmnbkjZZnvhkfPn.fastq.gz

and I would like to remove the "_L2" part and everything else after the third underscore, in order to have something like:

  • XXXXX_R1.fastq.gz
  • XXXXX_R2.fastq.gz

The number "XXXXX" varies between the files, and for each number there is always a R1 file and a R2 file.

Maybe a rename or a sed command can help.

Thanks!


Solution

  • Using rename utility you can do this:

    rename -n 's/^([^_]+)_L2(_[^_]+)[^.]+(\..+)$/$1$2$3/' *.gz
    

    Once satisfied with dry run remove -n option and rerun.