Search code examples
bashunixsubstring

How to pad a value with zeroes based on a match in a string and the length of the following string?


I have some problems adapting the answers from previous questions, so I hope it is ok to write for a specific solution.

I have a file with RNA-reads in the fasta format, however the end of the readname has been messed up, so I need to correct it.

It is a simple task of padding zeroes into the middle of a string, however I cannot get it to work as I also need to identify the length and the position of the problem.

My read file header looks like this:

@V350037327L1C001R0010000023/1_U1

and I need to search for the "/1_U" and then left pad zeroes to the rest of the line up to a total length of 6. It will look like this:

@V350037327L1C001R0010000023/1_U000001

The final length should be six following "/1_U". eg: input:

@V350037327L1C001R0010000055/1_U300 = /1_U000300
@V350037327L1C001R0010000122/1_U45000 = /1_U045000

I have tried with awk, however I cannot get it to check the initial length and hence not pad the correct number of zeroes.

Thank you in advance and thank you for your neverending support in this forum


Solution

  • Try this:

    #! /bin/bash
    
    files=('@V350037327L1C001R0010000023/1_U1'
           '@V350037327L1C001R0010000055/1_U300'
           '@V350037327L1C001R0010000122/1_U45000')
    
    for file in "${files[@]}"; do
      if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
        printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
      fi
    done
    

    Update: This reads the files from stdin.

    #! /bin/bash
    
    while read -r file; do
      if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
        printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
      fi
    done
    

    Update 2: You should really learn the basics of shell programming before you start programming the shell. Typical basics are conditional constructs.

    #! /bin/bash
    
    while read -f file; do
      if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
        printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
      else
        printf '%s\n' "$file"
      fi
    done