How to pad a value with zeroes based on a match in a string and the length of the following string?

I have some problems adapting the answers from previous questions, so I hope it is ok to write for a specific solution.

I have a file with RNA-reads in the fasta format, however the end of the readname has been messed up, so I need to correct it.

It is a simple task of padding zeroes into the middle of a string, however I cannot get it to work as I also need to identify the length and the position of the problem.

My read file header looks like this:

@V350037327L1C001R0010000023/1_U1

and I need to search for the "/1_U" and then left pad zeroes to the rest of the line up to a total length of 6. It will look like this:

@V350037327L1C001R0010000023/1_U000001

The final length should be six following "/1_U". eg: input:

@V350037327L1C001R0010000055/1_U300 = /1_U000300
@V350037327L1C001R0010000122/1_U45000 = /1_U045000

I have tried with awk, however I cannot get it to check the initial length and hence not pad the correct number of zeroes.

Thank you in advance and thank you for your neverending support in this forum

Solution

Try this:

#! /bin/bash

files=('@V350037327L1C001R0010000023/1_U1'
       '@V350037327L1C001R0010000055/1_U300'
       '@V350037327L1C001R0010000122/1_U45000')

for file in "${files[@]}"; do
  if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
    printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
  fi
done

Update: This reads the files from stdin.

#! /bin/bash

while read -r file; do
  if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
    printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
  fi
done

Update 2: You should really learn the basics of shell programming before you start programming the shell. Typical basics are conditional constructs.

#! /bin/bash

while read -f file; do
  if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
    printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
  else
    printf '%s\n' "$file"
  fi
done