Search code examples
.netregexlookbehind

How does atomic group inside a positive lookbehind work?


I don't understand why the regex (?<=i:(?>\D*))\d does not match the string i:>1.

The way I undertand it:

  • at index 0: the lookbehind i won't match
  • at index 1: the lookbehind i: won't match
  • at index 2: the lookbehind i:(?>\D*) will match i: but the \d after the lookbehind won't match >
  • at index 3: the lookbehind i:(?>\D*) will match i:> and the \d after the lookbehind will match 1 -> the regex is satisfied

Solution

  • See Regular Expressions Cookbook: Detailed Solutions in Eight Programming Languages:

    .NET allows you to use anything inside lookbehind, and it will actually apply the regular expression from right to left. Both the regular expression inside the lookbehind and the subject text are scanned from right to left.

    The (?<=i:(?>\D*))\d pattern does not match the 1 in i:>1 because the atomic group (?>\D*) prevents any backtracking into its pattern. The i: (actually, : and then i gets matched) is matched with \D*, and then there no way to re-match i: as the atomic group does not allow backtracking.

    You can also see that (?<=i:(?>[^:\d]*))\d will match 1 in i:>1 because here, [^:\d]* matches any char but : and digits, and thus only comes up to i: and i: is still there to be matched.