I am trying to split a string by the group "%in%" and the character "@". All documentation and everything I can find says that parenthesis are metacharacters used for grouping in R regex. So the code
> strsplit('example%in%aa(bbb)aa@cdef', '[(%in%)@]', perl=TRUE)
SHOULD give me
[[1]]
[1] "example" "aa(bbb)aa" "cdef"
That is, it should leave the parentheses in "aa(bbb)aa" alone, because the parentheses in the matching expression are not escaped. But instead it ACTUALLY gives me
[[1]]
[1] "example" "" "" "" "aa" "bbb" "aa" "cdef"
as if the parentheses were not metacharacters! What is up with this and how can I fix it? Thanks!
This is true with and without the argument perl=TRUE in strsplit.
Not sure what documentation you're reading, but the Extended Regular Expressions section in ?regex
says:
Most metacharacters lose their special meaning inside a character class. ... (Only '^ - \ ]' are special inside character classes.)
You don't need to create a character class. Just use "or" |
(you likely don't need to group "%in%"
either, but it shouldn't hurt anything):
> strsplit('example%in%aa(bbb)aa@cdef', '(%in%)|@', perl=TRUE)
[[1]]
[1] "example" "aa(bbb)aa" "cdef"