I need to build a regex with capture groups that would result in the following:
12-34 # match: (1) (2) (3) (4)
1a-2b # match: (1) (a) (2) (b)
12-3b # nomatch
In a nutshell, if the first part has two digits, then the second part must also have two digits. And if it has a letter, then the second part must also have a letter.
In PCRE flavor, (\d)(\d|[abc])-(\d)(\d|[abc])
matches the third line, so it is too permissive.
Using named groups, (\d)(?<named>\d|[abc])-(\d)(?P=named)
matches no line at all, for it requires the second characters to be exactly the same. It is too restrictive.
Is there a way I can require that my second alternate group (\d|[abc])
takes the same branch as the first (\d|[abc])
?
Or do I need to fall back on the full (?:(\d)(\d)-(\d)(\d)|(\d)([abc])-(\d)([abc]))
which duplicates parts of my regex?
In PCRE you may use this regex:
^(?:(?<num>\d{2})-(?&num)|(?<alnum>\d\pL)-(?&alnum))$
RegEx Details:
(?<num>\d{2})
: named group num
for matching 2 digits(?<alnum>\d\pL)
: named group alnum
for matching 1 digit followed by a letter(?&num)
: Match same sub-pattern as in named group num
(?&alnum)
: Match same sub-pattern as in named group alnum
Another option is to use conditional sub-patterns in PCRE as:
^(?:(?<num>\d{2})|\d\pL)-(?(num)\d{2}|\d\pL)$