Search code examples
regexregex-group

Reusing branch reset group doesn't match all the alternatives


I am trying to validate an IPv4 address using the RegEx below

^((?|([0-9][0-9]?)|(1[0-9][0-9])|(2[0-5][0-5]))\.){3}(?2)$

The regex works fine until the 3rd octet of the IP address in most of the cases. But sometimes in the last octet, it only matches the first alternative in the Branch Reset Group and ignores the other alternating groups altogether. I know that all alternatives in a branch reset group refer to the same capturing group. I tried the suggestion to reuse the capture groups as described in this StackOverflow post. It worked partially.

RegEx match results


Solution

  • The reason is that (?2) regex subroutine recurses the first capturing group pattern with the ID 2, ([0-9][0-9]?). If it fails to match (the $ requires the end of string right after it), backtracking starts and the match is eventually failed.

    The correct approach to recurse a group of patterns is to avoid using a branch reset group and capture all alternatives into a single capturing group that will be recursed:

    ^(?:(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?1)$
    //  |____________ Group 1 _______________|        \_ Regex subroutine
    

    See the regex demo.

    Note the octet pattern is a bit different, it is taken from How to Find or Validate an IP Address. Your octet pattern is wrong because 2[0-5][0-5] does not match numbers between 200 and 255 that end with 6, 7, 8 and 9.