I'm processing some SNP column into vcf format.
the input columns are as following :
ref ALT
A A G
A A T
T C T
G G T
A A G
C C G T
G A G
T C T
T A G T
expected output :
ref ALT
A G
A T
T C
G T
A G
C G,T
G A
T C
T A,G
$ awk 'BEGIN{FS=OFS="\t"} NR>1{sub($1," ",$2); gsub(/^ +| +$/,"",$2); gsub(/ +/,",",$2)} 1' file
ref ALT
A G
A T
T C
G T
A G
C G,T
G A
T C
T A,G
The above will only work when $1 doesn't contain RE metachars and can't be a substring of any of the strings in $2.