I'm working with tab delimited file (VCF file enter link description here) with large number of columns (a small example is bellow)
1 13979 S01_13979 C G . . PR GT ./. ./.
1 13980 S01_13980 G A . . PR GT ./. ./.
1 13986 S01_13986 G A . . PR GT ./. ./.
1 14023 S01_14023 G A . . PR GT 0/0 ./.
1 15671 S01_15671 A T . . PR GT 0/0 0/0
1 60519 S01_60519 A G . . PR GT 0/0 0/0
1 60531 S01_60531 T C . . PR GT 0/0 0/0
1 63378 S01_63378 A G . . PR GT 1/1 ./.
1 96934 S01_96934 C T . . PR GT 0/0 0/0
1 96938 S01_96938 C T . . PR GT 0/0 0/0
In the 1-st column (chromosome name) i have numbers from 1 to 26 (e.g. 1,2,...25,26). I'd like to add HanXRQChr0 prefix to the numbers from 1 to 9, and HanXRQChr prefix to the numbers from 10 to 26. The values in all other columns should remain unchanged.
For now i tried a sed
solution, but the output is not completely correct (the last pipe doesn't work):
cat test.vcf | sed -r '/^[1-9]/ s/^[1-9]/HanXRQChr0&/' | sed -r '/^[1-9]/ s/^[0-9]{2}/HanXRQChr&/' > test-1.vcf
How to do that by AWK
? I think AWK
would be a safer to use in my case, to directly change only the 1-st column of the file.
Could you please try following.
awk -v first="HanXRQChr0" -v second="HanXRQChr" '
$1>=1 && $1<=9{
$1=first $1
}
$1>=10 && $1<=26{
$1=second $1
}
1' Input_file
You could change the variable named first
and second
's values as per your need too. What it will do it will check if first field's value is from 1 to 9 it will prefix variable second
value to it and if first field's value is from 10 to 26 it will prefix first
variable's value in it.
Explanation: Adding explanation too here for code above.
awk -v first="HanXRQChr0" -v second="HanXRQChr" ' ##Creating variable named first and second and you could keep their values as per your need.
$1>=1 && $1<=9{ ##Checking condition when first field is greater than or equal to 1 and less than or equal to 9 here then do following.
$1=first $1 ##Re-creating the first field and adding variable first value before it here.
} ##closing this condition block here.
$1>=10 && $1<=26{ ##Checking condition here if 1st field is greater than or equal to 10 AND lesser than or equal to 26 then do following.
$1=second $1 ##Re-creating first field value and adding variable second value before $1 here.
} ##Closing this condition block here.
1 ##Mentioning 1 will be printing the line here.
' Input_file ##Mentioning Input_file name here.