Search code examples
regexregex-greedy

RegEx greedy matching and backtracking


I am trying to gain better understanding of the backtracking process for greedy matching.

Requesting your help to confirm/correct my assertion mentioned below.

Regex: .*man

Test string: ithmati

I used regex101.com debugger and captured the first 11 steps of the matching process into a pic which I have attached to this post.

Assertion: In step 9, the reason why engine backtracked in the test string to "h" is because it had already backtracked to "m" in step 6 so the next best course is to go further back.

Pic: Greedy backtracking

Pic: Non-greedy backtracking


Solution

  • The steps you showed seem like exactly what I would have expected here. In step #9, the engine fails to find the letter n in the next position, to complete the match for .*man. Since .* is greedy, the engine starts at the end of the string, and tries to find the latest occurrence of man first. Since this did not happen with the first ma match, the engine falls backwards to try find an earlier match. It does not find any earlier chance for a match, so the pattern then fails without a match.