I understand the ?
mark here means "lazy".
My question essentially is [0-9]{2}?
vs [0-9]{2}
Are they same?
If so, why are we writing the former expression? Aren't lazy mode more expensive performance wise?
If not, can you tell the difference?
There is not a difference between [0-9]{2}
and [0-9]{2}?
.
The difference between greedy matching and lazy matching (the addition of a ?
) has to do with backtracking. Regular expression engines are built to match text (from left to right). Therefore it is logical that when you ask an expression to match a range of character(s), it matches as many as possible.
Assume we have the string acac123
.
If we use a greedy match of [a-z]+c
(+
standing for 1+ repetitions or {1,}
):
[a-z]+
would match acac
and fail at 1
c
, but fail at 1
aca
and c
If we make this lazy ([a-z]+?c
), we will get both a different response (in this case) and be more efficient:
[a-z]+?
would match a
, but stop because it sees the next character matches the rest of the expression c
c
would then match, successfully matching a
and c
(with no backtracking)Now you can see that there will be no difference between X{#}
and X{#}?
, because {#}
is not a range and even a greedy match will not experience any backtracking. Lazily matches are often used with *
(0+ repetitions or {0,}
) or +
, but can also be used with ranges {m,n}
(where n
is optional).
This is essential when you want to match the least amount of characters possible and you will often see .*?
in an expression when you want to fill up some space (foo.*?bar
on a string foo bar filler text bar
). However, many times a lazy match is an example of bad/inefficient regex. Many people will do something like foo:"(.*?)"
to match everything within double quotes, when you can avoid a lazy match by writing your expression like foo:"([^"]+)"
and match anything but "
s.
Final note, ?
typically means "optional" or match {0,1}
times. ?
only will make a match lazy if you use it on a range ({m,n}
, *
, +
, or another ?
). This means X?
will not make X
lazy (since we already said {#}?
is pointless), but instead it will be optional. However, you can do a lazy "optional" match: [0-9]??
will lazily match 0-1 times.