Search code examples
regexrstrsplit

R: split only when special regex condition doesn't match


How would you split at every and/ERT only when it is not succeded by "/V" inside one word after in:

text <- c("faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT not else/VHGB propositions one and/ERT two/CDF and/ERT three/ABC")

# my try - !doesn't work
> strsplit(text, "(?<=and/ERT)\\s(?!./V.)", perl=TRUE)
                                    ^^^^

# Exptected return
[[1]]    
[1] "faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT"
[2] "not else/VHGB propositions one and/ERT"
[3] "two/CDF and/ERT"            
[4] "three/ABC"    

Solution

  • Actually, you need to approach this in another way:

    (?<=and/ERT)\\s(?!\\S+/V)
                      ^^^^
    

    You will need to use \\S+ because using .* will prevent a match even if /V is present two words ahead.

    \\S+ matches non spaces by the way.

    Lastly, the final period can be safely ignored.

    regex101 demo