Search code examples
regexregex-greedynon-greedy

Regular expression in regards to question mark "lazy" mode


I understand the ? mark here means "lazy".

My question essentially is [0-9]{2}? vs [0-9]{2}

Are they same?
If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise?
If not, can you tell the difference?


Solution

  • There is not a difference between [0-9]{2} and [0-9]{2}?.

    The difference between greedy matching and lazy matching (the addition of a ?) has to do with backtracking. Regular expression engines are built to match text (from left to right). Therefore it is logical that when you ask an expression to match a range of character(s), it matches as many as possible.


    Assume we have the string acac123.

    If we use a greedy match of [a-z]+c (+ standing for 1+ repetitions or {1,}):

    • [a-z]+ would match acac and fail at 1
    • then we would try to match the c, but fail at 1
    • now we start backtracking, and successfully match aca and c

    If we make this lazy ([a-z]+?c), we will get both a different response (in this case) and be more efficient:

    • [a-z]+? would match a, but stop because it sees the next character matches the rest of the expression c
    • the c would then match, successfully matching a and c (with no backtracking)

    Now you can see that there will be no difference between X{#} and X{#}?, because {#} is not a range and even a greedy match will not experience any backtracking. Lazily matches are often used with * (0+ repetitions or {0,}) or +, but can also be used with ranges {m,n} (where n is optional).

    This is essential when you want to match the least amount of characters possible and you will often see .*? in an expression when you want to fill up some space (foo.*?bar on a string foo bar filler text bar). However, many times a lazy match is an example of bad/inefficient regex. Many people will do something like foo:"(.*?)" to match everything within double quotes, when you can avoid a lazy match by writing your expression like foo:"([^"]+)" and match anything but "s.


    Final note, ? typically means "optional" or match {0,1} times. ? only will make a match lazy if you use it on a range ({m,n}, *, +, or another ?). This means X? will not make X lazy (since we already said {#}? is pointless), but instead it will be optional. However, you can do a lazy "optional" match: [0-9]?? will lazily match 0-1 times.