Search code examples
regexpcrelookbehind

Negative Lookbehind fails before an Optional Token


(?<!a)b?c

Against abc, this regex matches c. Am I missing something?


Solution

  • Yes, that is correct. Here is a quick walk-through of the match from the engine's stand point.

    • Try to match starting at the position before the a. Fail. Advance in the string.
    • Try to match starting at the position before the a. Fail. Advance in the string.
    • Current position: right before the c
    • Can the negative lookbehind (?<!a) assert that what precedes is not a? Check. (It's b)
    • Can b? match zero or one b? Check. We match zero b
    • Can c matches a c? Check.
    • Are there any more tokens to match? Nope. We have a match.

    Looking Far Behind

    In .NET, which has infinite lookbehind, you could use this:

    (?<!a.*)b?c
    

    But PCRE does not have infinite lookbehind. You can use this instead:

    ^[^a]*\Kb?c
    

    How it works:

    • The ^ anchor asserts that we are at the beginning of the string
    • [^a]* matches any non-a chars
    • The \K tells the engine to drop what was matched so far from the final match it returns
    • b?c matches the optional b and the c