Search code examples
regexperlregex-lookaroundslookbehind

Nested regex lookahead and lookbehind


I am having problems with the nested '+'/'-' lookahead/lookbehind in regex.

Let's say that I want to change the '*' in a string with '%' and let's say that '\' escapes the next character. (Turning a regex to sql like command ^^).

So the string

  • '*test*' should be changed to '%test%',
  • '\\*test\\*' -> '\\%test\\%', but
  • '\*test\*' and '\\\*test\\\*' should stay the same.

I tried:

(?<!\\)(?=\\\\)*\*      but this doesn't work
(?<!\\)((?=\\\\)*\*)    ...
(?<!\\(?=\\\\)*)\*      ...
(?=(?<!\\)(?=\\\\)*)\*  ...

What is the correct regex that will match the '*'s in examples given above?

What is the difference between (?<!\\(?=\\\\)*)\* and (?=(?<!\\)(?=\\\\)*)\* or if these are essentially wrong the difference between regex that have such a visual construction?


Solution

  • To find an unescaped character, you would look for a character that is preceded by an even number of (or zero) escape characters. This is relatively straight-forward.

    (?<=(?<!\\)(?:\\\\)*)\*        # this is explained in Tim Pietzcker' answer
    

    Unfortunately, many regex engines do not support variable-length look-behind, so we have to substitute with look-ahead:

    (?=(?<!\\)(?:\\\\)*\*)(\\*)\*  # also look at ridgerunner's improved version
    

    Replace this with the contents of group 1 and a % sign.

    Explanation

    (?=           # start look-ahead
      (?<!\\)     #   a position not preceded by a backslash (via look-behind)
      (?:\\\\)*   #   an even number of backslashes (don't capture them)
      \*          #   a star
    )             # end look-ahead. If found,
    (             # start group 1
      \\*         #   match any number of backslashes in front of the star
    )             # end group 1
    \*            # match the star itself
    

    The look-ahead makes sure only even numbers of backslashes are taken into account. Anyway, there is no way around matching them into a group, since the look-ahead does not advance the position in the string.