Search code examples
regexmatchingpcrestring-matching

Refer to same branch of previous alternate group


I need to build a regex with capture groups that would result in the following:

12-34 # match: (1) (2) (3) (4)
1a-2b # match: (1) (a) (2) (b)
12-3b # nomatch

In a nutshell, if the first part has two digits, then the second part must also have two digits. And if it has a letter, then the second part must also have a letter.

In PCRE flavor, (\d)(\d|[abc])-(\d)(\d|[abc]) matches the third line, so it is too permissive.

Using named groups, (\d)(?<named>\d|[abc])-(\d)(?P=named) matches no line at all, for it requires the second characters to be exactly the same. It is too restrictive.

Is there a way I can require that my second alternate group (\d|[abc]) takes the same branch as the first (\d|[abc])?
Or do I need to fall back on the full (?:(\d)(\d)-(\d)(\d)|(\d)([abc])-(\d)([abc])) which duplicates parts of my regex?


Solution

  • In PCRE you may use this regex:

    ^(?:(?<num>\d{2})-(?&num)|(?<alnum>\d\pL)-(?&alnum))$
    

    RegEx Demo 1

    RegEx Details:

    • (?<num>\d{2}): named group num for matching 2 digits
    • (?<alnum>\d\pL): named group alnum for matching 1 digit followed by a letter
    • (?&num): Match same sub-pattern as in named group num
    • (?&alnum): Match same sub-pattern as in named group alnum

    Another option is to use conditional sub-patterns in PCRE as:

    ^(?:(?<num>\d{2})|\d\pL)-(?(num)\d{2}|\d\pL)$
    

    RegEx Demo 2