I have a tab delimited file which looks like this:
CHROM <TAB> POS <TAB> AD0062-C <TAB> AD0063-C <TAB> AD0065-C <TAB> AD0074-C
2L <TAB> 440 <TAB>0/1:63:60,0,249 <TAB>0/1:89:86,0,166 <TAB>1/1:96:107,24,0<TAB>1/1:49:42,6,0
2L <TAB> 260<TAB>0/1:66:63,0,207<TAB> 1/1:99:227,111,0<TAB>1/1:99:255,144,0<TAB> 1/1:49:42,6,0
2L <TAB> 595 <TAB> 0/1:11:85,0,8 <TAB>0/1:13:132,0,10 <TAB>0/1:73:70,0,131<TAB> 0/1:59:72,0,56
I want to select only the first 3 characters starting from column 3 so that I can get an output that looks like this:
CHROM <TAB> POS <TAB> AD0062-C <TAB> AD0063-C <TAB> AD0065-C <TAB> AD0074-C
2L <TAB> 440 <TAB> 0/1 <TAB> 0/1 <TAB> 1/1 <TAB> 1/1
2L <TAB> 260 <TAB> 0/1 <TAB> 1/1 <TAB> 1/1 <TAB> 1/1
2L <TAB> 595 <TAB> 0/1 <TAB> 0/1 <TAB> 0/1 <TAB> 0/1
Thanks
Using awk
. For every line but first one and if it has more that two fields, get substring of them. The print
command it is for every line because it has no condition.
awk '
BEGIN { OFS = "\t" }
NF > 2 && FNR > 1 {
for ( i=3; i<=NF; i++ ) {
$i = substr( $i, 1, 3 )
}
}
{ print }
' infile
Output:
CHROM POS AD0062-C AD0063-C AD0065-C AD0074-C
2L 440 0/1 0/1 1/1 1/1
2L 260 0/1 1/1 1/1 1/1
2L 595 0/1 0/1 0/1 0/1