I have a .txt-file like this:
'SMb_TSS0303' '171765' '171864' '-' 'NC_003078' 'SMb20154'
'SMb_TSS0302' '171758' '171857' '-' 'NC_003078' 'SMb20154'
I want to extract the following as parameters:
-'SMb'
-'171765'
-'171864'
-'-' (minus)
-> need them without quotes
I am trying to do this in a shell script:
#!/bin/sh
file=$1
cat "$1"|while read line; do
echo "$line"
parent=$(awk {'print substr($line,$0,5)'})
echo "$parent"
done
echos 'SMb
As far as I understood awk substr, I though, it would work like this:
substr(s, a, b)=>returns b number of chars from string s, starting at position a
Firstly, I do not get, why I can extract 'Smb with 0-5, secondly, I can't extract any other parameter I need, because moving the start does not work. E.g. $1,6 gives empty echo. I would expect Mb_TSS
#!/bin/sh
file=$1
cat "$1"|while read line; do
parent=$(awk {'print substr($line,$0,5)'})
start=$(awk{'print subtrs($line,?,?')})
end=$(awk{'print subtrs($line,?,?')})
strand=$(awk{'print subtrs($line,?,?')})
done
echo "$parent" -> echos SMb
echo "$start" -> echos 171765
echo "$end" -> echos 171864
echo "$strand" -> echos -
I have an assumption, that the items in the lines are seen as single strings or something? Maybe I am also handling the file-parsing wrongly, but everything I tried does not work.
Really unclear exactly what you're trying to do. But I can at least help you with the awk
syntax:
while read -r line
do
parent=$(echo $line | awk '{print substr($1,2,3)}')
start=$(echo $line | awk '{print substr($2,2,6)}')
echo $parent
echo $start
done < file
This outputs:
SMb
171765
SMb
171758
You should be able to figure out how to get the rest of the fields.
This is quite an inefficient way to do this but based on the information in the question I'm unable to provide a better answer at the moment.