Given a file test.txt
with the following contents:
ABC DEF GATTAG GHK
ABC DEF GGCGTC GHK
ABC DEF AATTCC GHK
the 3rd column needs to be modified, so that the string is reverse complement. Part of it can be done with a bash command:
cat test.txt | cut -f3 | rev | tr ATGC TACG
CTAATC
GACGCC
GGAATT
How can this be implemented with awk
? (there is a bigger awk script for processing of files, which this function will be added to.)
One possible way this might be done is by executing rev | tr ATGC TACG
inside of awk
, similar to:
awk '{newVar=system("rev | tr ATGC TACG"$3); print $1 $2 newVar $4}' test.txt
However, this and various similar versions do not work. Can someone point out what is incorrect?
Just do the string reversal and translation in awk itself:
$ awk '
BEGIN {
old="ATGC"
new="TACG"
for (i=1;i<=length(old);i++) {
tr[substr(old,i,1)] = substr(new,i,1)
}
}
{
newVar=""
for (i=1;i<=length($3);i++) {
char = substr($3,i,1)
newVar = (char in tr ? tr[char] : char) newVar
}
print $1, $2, newVar, $4
}
' file
ABC DEF CTAATC GHK
ABC DEF GACGCC GHK
ABC DEF GGAATT GHK
If you really feel a burning need to call an external tool from awk and read the result back that'd be:
$ awk '
{
cmd="echo \047" $3 "\047 | rev | tr \047ATGC\047 \047TACG\047"
newVar=((cmd | getline line) > 0 ? line : "failed")
close(cmd)
print $1, $2, newVar, $4
}
' file
ABC DEF CTAATC GHK
ABC DEF GACGCC GHK
ABC DEF GGAATT GHK
but you should expect a significant performance hit from doing that and see also the getline caveats: http://awk.freeshell.org/AllAboutGetline.