Search code examples
javaregexgreedyquantifiers

How exactly does the possessive quantifier work?


At the end of the page there is at attempted explanation of how do greedy, reluctant and possessive quantifiers work: http://docs.oracle.com/javase/tutorial/essential/regex/quant.html

However I tried myself an example and I don't seem to understand it fully.

I will paste my results directly:

Enter your regex: .*+foo
Enter input string to search: xfooxxxxxxfoo
No match found.

Enter your regex: (.*)+foo
Enter input string to search: xfooxxxxxxfoo
I found the text "xfooxxxxxxfoo" starting at index 0 and ending at index 13.

Why does the first reg.exp. find no match and the second does? What is the exact difference between those 2 reg.exp.?


Solution

  • The + after another quantifier means "don't allow the regex engine to backtrack into whatever the previous token has matched". (See a tutorial on possessive quantifiers here).

    So when you apply .*foo to "xfooxxxxxxfoo", the .* first matches the entire string. Then, since foo can't be matched, the regex engine backtracks until that's possible, achieving a match when .* has matched "xfooxxxxxx" and foo has matched "foo".

    Now the additional + prevents that backtracking from happening, so the match fails.

    When you write (.*)+foo. the + takes on an entirely different meaning; now it means "one or more of the preceding token". You've created nested quantifiers, which is not a good idea, by the way. If you apply that regex to a string like "xfoxxxxxxxxxfox", you'll run into catastrophic backtracking.