Search code examples
awkgawknawk

Replace fields with values of other fields in the same line


I have this kind of input:

rs10000004 C T 4 rs10000004 0 75625312 C C C C T 0 C T 
rs10000005 G A 4 rs10000005 0 75625355 G 0 A A A G A A 

I want to substitute columns from 8 to end by "A" if the value in the column is identical to the 2nd field $2 or "B" if the value is identical to the third field $3. Else, the value is printed as it is (Zero values are expected in some columns)

Expected output

rs10000004 C T 4 rs10000004 0 75625312 A A A A B 0 A B 
rs10000005 G A 4 rs10000005 0 75625355 A 0 B B B A B B 

I tried the following but it doesn't give me any results just empty lines. Improving my code is better for me than to show me a new solution using something other than awk

cat input | awk '{ for(i=8; i<=NF; i++) { if($i == $2) $i="A"; else if($i == $3) $i="B"; else $i == 0; } print $i }'

Thanks in advance


Solution

  • Code :

    awk '
    {
        for (i=8; i<=NF; i++) {
           if ($i == $2) {
               $i = "A";
           }
           else {
               if ($i == $3) {
                   $i = "B";
               }
               else {
                   $i = 0;
               }
           }
        }
        print;        
    }' input
    

    Or shorter :

    awk '
    {
        for (i=8; i<=NF; i++) {
           if ($i == $2)
               $i="A";
           else
               if ($i == $3)
                   $i="B";
               else
                   $i = 0;
        }
    }
    1' input
    

    Output :

    rs10000004 C T 4 rs10000004 0 75625312 A A A A B 0 A B 
    rs10000005 G A 4 rs10000005 0 75625355 A 0 B B B A B B