Search code examples
linuxbashawksubstr

Linux Bash: Use awk(substr) to get parameters from file input


I have a .txt-file like this:

'SMb_TSS0303'   '171765'    '171864'    '-' 'NC_003078' 'SMb20154'  
'SMb_TSS0302'   '171758'    '171857'    '-' 'NC_003078' 'SMb20154'

I want to extract the following as parameters:

-'SMb'

-'171765'

-'171864'

-'-' (minus)

-> need them without quotes

I am trying to do this in a shell script:

#!/bin/sh
file=$1

cat "$1"|while read line; do
  echo "$line"
  parent=$(awk {'print substr($line,$0,5)'})
  echo "$parent"
done

echos 'SMb

As far as I understood awk substr, I though, it would work like this:

substr(s, a, b)=>returns b number of chars from string s, starting at position a

Firstly, I do not get, why I can extract 'Smb with 0-5, secondly, I can't extract any other parameter I need, because moving the start does not work. E.g. $1,6 gives empty echo. I would expect Mb_TSS

Desired final output:

#!/bin/sh

file=$1

cat "$1"|while read line; do
  parent=$(awk {'print substr($line,$0,5)'})
  start=$(awk{'print subtrs($line,?,?')})
  end=$(awk{'print subtrs($line,?,?')})
  strand=$(awk{'print subtrs($line,?,?')})
done

echo "$parent"    -> echos SMb
echo "$start"     -> echos 171765
echo "$end"       -> echos 171864
echo "$strand"    -> echos -

I have an assumption, that the items in the lines are seen as single strings or something? Maybe I am also handling the file-parsing wrongly, but everything I tried does not work.


Solution

  • Really unclear exactly what you're trying to do. But I can at least help you with the awk syntax:

    while read -r line
    do 
        parent=$(echo $line | awk '{print substr($1,2,3)}')
        start=$(echo $line | awk '{print substr($2,2,6)}')
        echo $parent
        echo $start
    done < file
    

    This outputs:

    SMb
    171765
    SMb
    171758
    

    You should be able to figure out how to get the rest of the fields.

    This is quite an inefficient way to do this but based on the information in the question I'm unable to provide a better answer at the moment.