I'm in R. I want to match whole words in text, taking punctuation into account. Example:
to_match = c('eye','nose')
text1 = 'blah blahblah eye-to-eye blah'
text2 = 'blah blahblah eye blah'
I would like eye
to be matched in text2
but not in text1
.
That is, the command:
to_match[sapply(paste0('\\<',to_match,'\\>'),grepl,text1)]
should return character(0)
. But right now, it returns eye
.
I also tried with '\\b'
instead of '\\<'
, with no success.
Use
to_match[sapply(paste0('(?:\\s|^)',to_match,'(?:\\s|$)'),grepl,text1)]
The point is that word boundaries match between a word and a nonword chars, that is why you had a match in eye-to-eye
. You want to match only in between start or end of string and whitespace.
In a TRE regex, this is better done with groups as this regex library does not support lookarounds and you just need to test a string for a single pattern match to return true or false.
The (?:\s|^)
noncapturing group matches any whitespace or start of string and (?:\s|$)
matches whitespace or end of string.