I have this kind of input:
rs10000004 C T 4 rs10000004 0 75625312 C C C C T 0 C T
rs10000005 G A 4 rs10000005 0 75625355 G 0 A A A G A A
I want to substitute columns from 8 to end by "A" if the value in the column is identical to the 2nd field $2 or "B" if the value is identical to the third field $3. Else, the value is printed as it is (Zero values are expected in some columns)
Expected output
rs10000004 C T 4 rs10000004 0 75625312 A A A A B 0 A B
rs10000005 G A 4 rs10000005 0 75625355 A 0 B B B A B B
I tried the following but it doesn't give me any results just empty lines. Improving my code is better for me than to show me a new solution using something other than awk
cat input | awk '{ for(i=8; i<=NF; i++) { if($i == $2) $i="A"; else if($i == $3) $i="B"; else $i == 0; } print $i }'
Thanks in advance
Code :
awk '
{
for (i=8; i<=NF; i++) {
if ($i == $2) {
$i = "A";
}
else {
if ($i == $3) {
$i = "B";
}
else {
$i = 0;
}
}
}
print;
}' input
Or shorter :
awk '
{
for (i=8; i<=NF; i++) {
if ($i == $2)
$i="A";
else
if ($i == $3)
$i="B";
else
$i = 0;
}
}
1' input
Output :
rs10000004 C T 4 rs10000004 0 75625312 A A A A B 0 A B
rs10000005 G A 4 rs10000005 0 75625355 A 0 B B B A B B