Search code examples
regexoption-typeicunegative-lookbehind

What is the most efficient way to get a negative lookbehind to work alongisde an optional value?


I'm aware negative look-behinds have to be zero width but I've noticed an issue where they don't work if the preceding token is optional. Why does this happen?

(?<!test):?(\\d{3})

Fails on test123. But passes test:123

Is there a solution to this other than (?<!test|test:)? I'd rather avoid the above solution as the regex I'd like to apply this to already has a lot of negative look-behind phrases which I'd be doubling.

Edit: I initially posted this using a PCRE editor but I'm coding with ICU


Solution

  • With the ICU regex engine, you have an access to a constrained-width lookbehind that allows using limiting quantifiers of known length inside the lookbehinds.

    So, use

    (?<!test:{0,1})\d{3}
            ^^^^^^
    

    The :{0,1} will match one or zero :.

    Note that ICU regex does not work the same as PCRE, you should be aware of the differences when testing in an incompatible environment, such as regex101.com.

    Some cool PCRE features that are missing in ICU:

    • (*SKIP)(*FAIL) verbs
    • \K operator

    Some cool ICU features missing in PCRE:

    • Constrained width-lookbehind ((?<!test:{0,1})\d{3})
    • Character class intersection ([\p{Letter}&&\p{script=cyrillic}]) or subtraction ([\p{Letter}--\p{script=latin}])