regex language-agnostic greedy regex-greedy

Matching text between delimiters: greedy or lazy regular expression?

For the common problem of matching text between delimiters (e.g. < and >), there's two common patterns:

using the greedy * or + quantifier in the form START [^END]* END, e.g. <[^>]*>, or
using the lazy *? or +? quantifier in the form START .*? END, e.g. <.*?>.

Is there a particular reason to favour one over the other?

Solution

Some advantages:

[^>]*:

More expressive.
Captures newlines regardless of /s flag.
Considered quicker, because the engine doesn't have to backtracks to find a successful match (with [^>] the engine doesn't make choices - we give it only one way to match the pattern against the string).

.*?

No "code duplication" - the end character only appears once.
Simpler in cases the end delimiter is more than a character long. (a character class would not work in this case) A common alternative is (?:(?!END).)*. This is even worse if the END delimiter is another pattern.