How do I match a series of groups but later groups should only match if all previous groups matched?
Example:
Match any string up to "abc" with each character in its own group.
Valid: a, ab, abc Invalid: b, c, bc, ac
The following works, but I'm not sure if there's a better way:
^(a)?(?(1)(b)?)(?(2)(c)?)$
That says that each match is optional, but depends on the group before it matching. That is, 'c' cannot match unless 'b' matches which cannot happen unless 'a' matches.
To help anyone googling this later, I'm parsing a Dicom DateTime which has the following format.
YYYYMMDDHHMMSS.FFFFFF&ZZXX # &ZZXX is an optional timezone offset
I used code to build this regex rather than just typing it in like this.
dicom_dt_parser = re.compile(
r'^' +
r'(?P<year>\d{4})' +
r'(?(year)(?P<month>\d{2})?)' +
r'(?(month)(?P<day>\d{2})?)' +
r'(?(day)(?P<hour>\d{2})?)' +
r'(?(hour)(?P<min>\d{2})?)' +
r'(?(min)(?P<sec>\d{2})?)' +
r'(?(sec)(?P<frac>\.\d{1,6})?)' +
r'(?P<tz>[\+\-]\d{4})?' +
r'$'
)
dicom_dt_parser.match(datetime_string).groupdict()
will return a dictionary with all of the fields. Missing fields will have values of None
.
What you are doing is perfectly fine, readable and straightforward. This is another shorter way to build your regex as well, using nested groups:
^a(b(c)?)?$
If you are going to accept empty input strings you may want to append |^$
to above regex.
Regex for abcdef
would be:
^a(b(c(d(e(f)?)?)?)?)?$
Your regex built with this work around:
^(?P<year>\d{4})(?:(?P<month>\d{2})(?:(?P<day>\d{2})(?:(?P<hour>\d{2})(?:(?P<min>\d{2})(?:(?P<sec>\d{2})(?:(?P<frac>\.\d{1,6})(?P<tz>[+-]\d{4})?)?)?)?)?)?)?$
Your own regex:
^(?P<year>\d{4})(?(year)(?P<month>\d{2})?)(?(month)(?P<day>\d{2})?)(?(day)(?P<hour>\d{2})?)(?(hour)(?P<min>\d{2})?)(?(min)(?P<sec>\d{2})?)(?(sec)(?P<frac>\.\d{1,6})?)(?P<tz>[\+\-]\d{4})?$