Search code examples
regexregex-greedy

Difference (if any) between .+? and .*


I am looking through some old code bases and have come across two regular expression parts that I think are semantically identical. Wondering it the Stackoverflow community can confirm my understanding.

RegEx 1: (.+?) - One or more characters, but optional

RegEx 2: (.*) - Zero or more characters

I keep thinking of different scenarios but can't think of any input where both expressions wouldn't be the same.


Solution

  • (.+?) means match one or more character, but instead of the default greedy matching (match as much as possible), the ? after the quantifier makes the matching lazy (match as few as possible).

    Conceptually, greedy matching will first try the longest possible sequence that can be formed by the pattern inside, then gradually reduce the length of the sequence as the engine backtracks. Lazy matching will first try the shortest possible sequence that can be formed by the pattern inside, then gradually increase the length of the sequence as the engine backtracks.

    Therefore, (.+?) and (.*) are completely different. Given a string "abc", the pattern (.+?) will match "a" for the first match, while (.*) will match "abc" for the first match.

    When you correct the pattern to the intended meaning: ((?:.+)?), it is exactly the same as (.*) in behavior. Since quantifier is greedy by default, ((?:.+)?) will first try the case of .+ before attempting the case of empty string. And .+ will try the longest sequence before the 1 character sequence. Therefore, the effect of ((?:.+)?) is exactly the same as (.*): it will find the longest sequence and backtrack gradually to the case of empty string.