Search code examples
rregexregex-lookarounds

Find substring not at beginning of string


I want to find and replace a substring of a string, but only if the substring is NOT at the beginning of the string. I've tried the code below, which gives these errors:

string <- c("some text followed by substring then more text followed by substring and then lots of text with lots of examples of substring and then more text", "substring at the beginning example with substring later in the string and then another substring blah blah, blah")
gsub("(?<!^)substring", "replacement", string)
Error in gsub("(?<!^)substring", "replacement", string) : 
  invalid regular expression '(?<!\^)substring', reason 'Invalid regexp'
gsub("(?<!\\^)substring", "replacement", string)
Error in gsub("(?<!\\^)substring", "replacement", string) : 
  invalid regular expression '(?<!\^)substring', reason 'Invalid regexp'

I'm using the following code, which works but doesn't seem "right":

gsub("(.+?)substring", "replacement", string)

Is this the best option?


Solution

  • The issue is with perl = TRUE which is by default FALSE according to ?gsub

    gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE, fixed = FALSE, useBytes = FALSE)

    gsub("(?<!^)substring", "replacement", string, perl = TRUE)
    #[1] "some text followed by replacement then more text followed by replacement and then lots of text with lots of examples of replacement and then more text"
    #[2] "substring at the beginning example with replacement later in the string and then another replacement blah blah, blah"