I have data frame with a column:
nf1$Info = AC=1;AF=0.500;AN=2;BaseQRankSum=-1.026e+00;ClippingRankSum=-1.026e+00;DP=4;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=28.25;MQRankSum=-1.026e+00;QD=10.18;ReadPosRankSum=1.03;SOR=0.693
I'm trying to extract a specific value from this column.
For e.g. I'm interested in "MQRankSum" and I used:
str_extract(nf1$Info,"[MQRankSum]+=[:punct:]+[0-9]+[.]+[0-9]+")
It returns value for BaseQRankSum instead of MQRankSum.
Including characters into square brackets creates a character class matching any of the defined characters, so [yes]+
matches yyyyyyyyy
, eyyyyss
, etc.
What you want to do is to match a word MQRankSum
, =
, and then any chars other than ;
:
str_extract(nf1$Info,"MQRankSum=[^;]+")
If you want to exlcude MQRankSum=
from the match, use a lookbehind:
str_extract(nf1$Info,"(?<=MQRankSum=)[^;]+")
^^^^^^^^^^^^^^^
The (?<=MQRankSum=)
positive lookbehind will make sure there is MQRankSum=
text immediately to the left of the current location, and only after that will match 1 or more chars other than ;
.