Search code examples
rregexgrepl

R: Regex error despite it working in regex 101


this is my first question (i'm still learning R) , i apologize in advance if the question is too stupid.
I'm trying to figure out how to make a regex that catches the first string, but not the second one.

strings <- c("p1_32_XYX_cancer_1", "p1_32_XYX_cancer_ttt_1")

I tested on regex101 and the best that i came up with is this (it works on regex101). However, when i try to input it in R, it comes up with the following error:

"(^p5[0-9].*XYX.*cancer)(?!.*ttt)"
Error in grep(needle, haystack, ...) : invalid regular expression 'mz|(^p5[0-9].*XYX.*cancer)(?!.*ttt)', reason 'Invalid regexp'

sorry for being unclear earlier, the exact code is :

ctc_gastric_df <- select(m,matches("mz|(^p5[0-9].*XYX.*cancer)(?!.*ttt)"))


Solution

  • We need perl = TRUE to make the regex in the OP's code to work without the error

    grep("(^p5[0-9].*XYX.*cancer)(?!.*ttt)", strings, perl = TRUE)