At the end of the page there is at attempted explanation of how do greedy, reluctant and possessive quantifiers work: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html
However I tried myself an example and I don't seem to understand it fully.
I will paste my results directly:
Enter your regex: .*+foo
Enter input string to search: xfooxxxxxxfoo
No match found.
Enter your regex: (.*)+foo
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.
Why does the first reg.exp. find no match and the second does? What is the exact difference between those 2 reg.exp.?
The +
after another quantifier means "don't allow the regex engine to backtrack into whatever the previous token has matched". (See a tutorial on possessive quantifiers here).
So when you apply .*foo
to "xfooxxxxxxfoo"
, the .*
first matches the entire string. Then, since foo
can't be matched, the regex engine backtracks until that's possible, achieving a match when .*
has matched "xfooxxxxxx"
and foo
has matched "foo"
.
Now the additional +
prevents that backtracking from happening, so the match fails.
When you write (.*)+foo
. the +
takes on an entirely different meaning; now it means "one or more of the preceding token". You've created nested quantifiers, which is not a good idea, by the way. If you apply that regex to a string like "xfoxxxxxxxxxfox"
, you'll run into catastrophic backtracking.