Search code examples
regexpython-3.xregex-greedy

Why does a*a match aaa?


I am using python3 re module - I find that a*a matches aaa. I thought that regex was greedy by default (unless we override it to be lazy with ?) - so, a* would have matched the entire string, and the trailing a in the pattern would fail. However, it matches:

$ import re
$ re.match(r'a*a', 'aaa')
<_sre.SRE_Match object; span=(0, 3), match='aaa'>

Should this not fail?


Solution

  • It does initially attempt to match the entire string, but repetition will backtrack if a match fails. After the a* initially matching the entire string, the regex tries to match the next token, the single a This fails, so then the a* backtracks back a character (so that it only matches aa rather than aaa). This time, the last token, the single a, is fulfilled, so a match is found.

    Greedyness doesn't mean that the regular expression will only match if the repeated token is allowed to match the whole rest of the string. It will if it can, but it'll backtrack if it can't.

    Even if greedy repetition with * backtracks down to zero length, there's no problem, because * means match zero or more times. (by contrast, repeating with +, if it backtracks down to zero length, the regular expression will fail completely, because + means that at least one repetition is required)