I have some problems adapting the answers from previous questions, so I hope it is ok to write for a specific solution.
I have a file with RNA-reads in the fasta format, however the end of the readname has been messed up, so I need to correct it.
It is a simple task of padding zeroes into the middle of a string, however I cannot get it to work as I also need to identify the length and the position of the problem.
My read file header looks like this:
@V350037327L1C001R0010000023/1_U1
and I need to search for the "/1_U" and then left pad zeroes to the rest of the line up to a total length of 6. It will look like this:
@V350037327L1C001R0010000023/1_U000001
The final length should be six following "/1_U". eg: input:
@V350037327L1C001R0010000055/1_U300 = /1_U000300
@V350037327L1C001R0010000122/1_U45000 = /1_U045000
I have tried with awk, however I cannot get it to check the initial length and hence not pad the correct number of zeroes.
Thank you in advance and thank you for your neverending support in this forum
Try this:
#! /bin/bash
files=('@V350037327L1C001R0010000023/1_U1'
'@V350037327L1C001R0010000055/1_U300'
'@V350037327L1C001R0010000122/1_U45000')
for file in "${files[@]}"; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
fi
done
Update: This reads the files from stdin.
#! /bin/bash
while read -r file; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
fi
done
Update 2: You should really learn the basics of shell programming before you start programming the shell. Typical basics are conditional constructs.
#! /bin/bash
while read -f file; do
if [[ $file =~ ^(.*U)([0-9]+)$ ]]; then
printf '%s%06d\n' "${BASH_REMATCH[@]:1}"
else
printf '%s\n' "$file"
fi
done