Search code examples
awksystemgetline

Execute bash command inside awk and print command output


Given a file test.txt with the following contents:

ABC DEF GATTAG GHK
ABC DEF GGCGTC GHK 
ABC DEF AATTCC GHK

the 3rd column needs to be modified, so that the string is reverse complement. Part of it can be done with a bash command:

cat test.txt | cut -f3 | rev | tr ATGC TACG

CTAATC
GACGCC
GGAATT

How can this be implemented with awk? (there is a bigger awk script for processing of files, which this function will be added to.)

One possible way this might be done is by executing rev | tr ATGC TACG inside of awk, similar to:

awk '{newVar=system("rev | tr ATGC TACG"$3); print $1 $2 newVar $4}' test.txt

However, this and various similar versions do not work. Can someone point out what is incorrect?


Solution

  • Just do the string reversal and translation in awk itself:

    $ awk '
        BEGIN {
            old="ATGC"
            new="TACG"
            for (i=1;i<=length(old);i++) {
                tr[substr(old,i,1)] = substr(new,i,1)
            }
        }
        {
            newVar=""
            for (i=1;i<=length($3);i++) {
                char = substr($3,i,1)
                newVar = (char in tr ? tr[char] : char) newVar
            }
            print $1, $2, newVar, $4
        }
    ' file
    ABC DEF CTAATC GHK
    ABC DEF GACGCC GHK
    ABC DEF GGAATT GHK
    

    If you really feel a burning need to call an external tool from awk and read the result back that'd be:

    $ awk '
        {
            cmd="echo \047" $3 "\047 | rev | tr \047ATGC\047 \047TACG\047"
            newVar=((cmd | getline line) > 0 ? line : "failed")
            close(cmd)
            print $1, $2, newVar, $4
        }
    ' file
    ABC DEF CTAATC GHK
    ABC DEF GACGCC GHK
    ABC DEF GGAATT GHK
    

    but you should expect a significant performance hit from doing that and see also the getline caveats: http://awk.freeshell.org/AllAboutGetline.