Search code examples
rregexpcre

split character at deliminator conditionally negative lookahead assertion


I want to split a string at . or : unless the next character is )

Following this question: R strsplit: Split based on character except when a specific character follows why isn't

strsplit("Glenelg (Vic.)",'\\.|:(?!\\))', perl = TRUE)

returning

[[1]]
[1] "Glenelg (Vic)" 

instead it splits at the ., like so:

[1] "Glenelg (Vic" ")"           

Solution

  • It is not grouped correctly. \.|:(?!\)) matches a . anywhere in a string or a : not followed with ). If you group . and : patterns, '(?:\\.|:)(?!\\))', it will work.

    However, you may use a better regex version based on a character class:

    strsplit("Glenelg (Vic.)",'[.:](?!\\))', perl = TRUE)
    [[1]]
    [1] "Glenelg (Vic.)"
    

    Here, [.:](?!\)) matches either . or : that are both not immediately followed with ).

    See the regex demo.