Search code examples
regexregex-lookarounds

Regexp matching a string - positive lookahead


Regexp: (?=(\d+))\w+\1 String: 456x56

Hi,

I am not getting the concept, how this regex matches "56x56" in the string "456x56".

  1. The lookaround, (?=(\d+)), captures 456 and put into \1, for (\d+)
  2. The wordcharacter, \w+, matches the whole string("456x56")
  3. \1, which is 456, should be followed by \w+
  4. After backtracking the string, it should not find a match, as there is no "456" preceded by a word character

However the regexp matches 56x56.


Solution

  • You don't anchor your regex, as has been said. Another problem is that \w also matches digits... Now look at how the regex engine proceeds to match with your input:

    # begin
    regex: |(?=(\d+))\w+\1
    input: |456x56
    # lookahead (first group = '456')
    regex: (?=(\d+))|\w+\1
    input: |456x56 
    # \w+
    regex: (?=(\d+))\w+|\1
    input: 456x56|
    # \1 cannot be satisfied: backtrack on \w+
    regex: (?=(\d+))\w+|\1
    input: 456x5|6 
    # And again, and again... Until the beginning of the input: \1 cannot match
    # Regex engine therefore decides to start from the next character:
    regex: |(?=(\d+))\w+\1
    input: 4|56x56
    # lookahead (first group = '56')
    regex: (?=(\d+))|\w+\1
    input: 4|56x56
    # \w+
    regex: (?=(\d+))\w+|\1
    input: 456x56|
    # \1 cannot be satisfied: backtrack
    regex: (?=(\d+))\w+|\1
    input: 456x5|6
    # \1 cannot be satisfied: backtrack
    regex: (?=(\d+))\w+|\1
    input: 456x|56
    # \1 satified: match
    regex: (?=(\d+))\w+\1|
    input: 4<56x56>