Search code examples
regexregex-lookarounds

Negative lookahead vs Positive lookahead syntax


Given the following text:

My name is foo.

My name is bar.

With the goal being to return each line which contains or does not contain a particular substring, both of the following positive and negative regex patterns can be used to return the same result:

Postive lookahead: ^(?=.*bar).*$ returns My name is bar.

Negative lookahead: ^((?!foo).)*$ returns My name is bar.

However, why does the negative lookahead need to be nested within multiple sets of parentheses with the qualifier . and the quantifier * separated by the parentheses whereas in the positive lookahead, they can be adjacent .*?


Solution

  • The negative lookahead need to be nested within multiple sets of parentheses with the qualifier . and the quantifier * is called a tempered greedy token. You do not have to use it in this scenario.

    You can use a normal lookahead anchored at the start instead of the tempered greedy token:

    ^(?!.*foo).*$
    

    See the regex demo

    Here,

    • ^ - matches the location at the start of the string
    • (?!.*foo) - a negative lookahead failing the match if there is foo somewhere on the line (or string if DOTALL mode is on)
    • .*$ - any 0+ characters (but a newline if DOTALL mode is off) up to the end of string/line.

    What to use?

    Tempered greedy token is usually much less efficient. Use the lookahead anchored at the start when you just need to check if a string contains something or not. However, the tempered greedy token might be required in some cases. See When to Use this Technique.