Search code examples
regexrmetacharacters

R Regex: Parenthesis Not Acting as Metacharacter


I am trying to split a string by the group "%in%" and the character "@". All documentation and everything I can find says that parenthesis are metacharacters used for grouping in R regex. So the code

    > strsplit('example%in%aa(bbb)aa@cdef', '[(%in%)@]', perl=TRUE)

SHOULD give me

    [[1]]
    [1] "example" "aa(bbb)aa"      "cdef"

That is, it should leave the parentheses in "aa(bbb)aa" alone, because the parentheses in the matching expression are not escaped. But instead it ACTUALLY gives me

    [[1]]
    [1] "example" ""   ""    ""    "aa"    "bbb"   "aa"    "cdef"

as if the parentheses were not metacharacters! What is up with this and how can I fix it? Thanks!

This is true with and without the argument perl=TRUE in strsplit.


Solution

  • Not sure what documentation you're reading, but the Extended Regular Expressions section in ?regex says:

    Most metacharacters lose their special meaning inside a character class. ... (Only '^ - \ ]' are special inside character classes.)

    You don't need to create a character class. Just use "or" | (you likely don't need to group "%in%" either, but it shouldn't hurt anything):

    > strsplit('example%in%aa(bbb)aa@cdef', '(%in%)|@', perl=TRUE)
    [[1]]
    [1] "example"   "aa(bbb)aa" "cdef"