Search code examples
awkvcf-variant-call-format

Add a prefix to values of specified column (of VCF file) by AWK


I'm working with tab delimited file (VCF file enter link description here) with large number of columns (a small example is bellow)

1 13979 S01_13979 C G . . PR GT ./. ./.
1 13980 S01_13980 G A . . PR GT ./. ./.
1 13986 S01_13986 G A . . PR GT ./. ./.
1 14023 S01_14023 G A . . PR GT 0/0 ./.
1 15671 S01_15671 A T . . PR GT 0/0 0/0
1 60519 S01_60519 A G . . PR GT 0/0 0/0
1 60531 S01_60531 T C . . PR GT 0/0 0/0
1 63378 S01_63378 A G . . PR GT 1/1 ./.
1 96934 S01_96934 C T . . PR GT 0/0 0/0
1 96938 S01_96938 C T . . PR GT 0/0 0/0

In the 1-st column (chromosome name) i have numbers from 1 to 26 (e.g. 1,2,...25,26). I'd like to add HanXRQChr0 prefix to the numbers from 1 to 9, and HanXRQChr prefix to the numbers from 10 to 26. The values in all other columns should remain unchanged. For now i tried a sedsolution, but the output is not completely correct (the last pipe doesn't work):

cat test.vcf | sed -r '/^[1-9]/ s/^[1-9]/HanXRQChr0&/' | sed -r '/^[1-9]/ s/^[0-9]{2}/HanXRQChr&/' > test-1.vcf

How to do that by AWK? I think AWK would be a safer to use in my case, to directly change only the 1-st column of the file.


Solution

  • Could you please try following.

    awk -v first="HanXRQChr0" -v second="HanXRQChr" '
    $1>=1 && $1<=9{
      $1=first $1
    }
    $1>=10 && $1<=26{
      $1=second $1
    }
    1' Input_file
    

    You could change the variable named first and second's values as per your need too. What it will do it will check if first field's value is from 1 to 9 it will prefix variable second value to it and if first field's value is from 10 to 26 it will prefix first variable's value in it.

    Explanation: Adding explanation too here for code above.

    awk -v first="HanXRQChr0" -v second="HanXRQChr" '  ##Creating variable named first and second and you could keep their values as per your need.
    $1>=1 && $1<=9{                                        ##Checking condition when first field is greater than or equal to 1 and less than or equal to 9 here then do following.
      $1=first $1                                          ##Re-creating the first field and adding variable first value before it here.
    }                                                      ##closing this condition block here.
    $1>=10 && $1<=26{                                      ##Checking condition here if 1st field is greater than or equal to 10 AND lesser than or equal to 26 then do following.
      $1=second $1                                         ##Re-creating first field value and adding variable second value before $1 here.
    }                                                      ##Closing this condition block here.
    1                                                      ##Mentioning 1 will be printing the line here.
    ' Input_file                                           ##Mentioning Input_file name here.