Search code examples
rstringparenthesesgrepl

How to search for strings with parentheses in R


Using R, I have a long list of keywords that I'm searching for in a dataset. One of the keywords needs to have parentheses around it in order to be included.

I've been attempting to replace the parenthesis in the keywords list with \\ then the parentheses, but have not been successful. If there is a way to modify the grepl() function to recognize them, that would also be helpful. Here is an example of what I'm trying to accomplish:

patterns<-c("dog","cat","(fish)")

data<-c("brown dog","black bear","salmon (fish)","red fish")

patterns2<- paste(patterns,collapse="|")

grepl(patterns2,data)

[1]  TRUE FALSE  TRUE  TRUE

I would like salmon (fish) to give TRUE, and red fish to give FALSE.

Thank you!


Solution

  • As noted by @joran in the comments, the pattern should look like so:

    patterns<-c("dog","cat","\\(fish\\)")
    

    The \\s will tell R to read the parentheses literally when searching for the pattern.

    Easiest way to achieve this if you don't want to make the change manually:

    patterns <- gsub("([()])","\\\\\\1", patterns)
    

    Which will result in:

    [1] "dog" "cat" "\\(fish\\)"
    

    If you're not very familiar with regular expressions, what happens here is that it looks for any one character within the the square brackets. The round brackets around that tell it to save whatever it finds that matches the contents. Then, the first four slashes in the second argument tell it to replace what it found with two slashes (each two slashes translate into one slash), and the \\1 tells it to add whatever it saved from the first argument - i.e., either ( or ).