I'd like to split a regex onto multiple lines for clarity, but I'm not sure what the best way to do this is with raw strings.
SECT_EXP = (
r'^(?P<number>.+?[.]? {1,2}' # Begin number pattern match
r'(?P<sect_num>' # Begin section number match
r'(?P<full_num>' # Begin full number match
r'(?P<title>\d{1,2}?)' # Match title substring
r'(?P<chapter>\d{2})' # Match chapter substring
r')' # End full number match
r'[.]'
r'(?P<section>\d+)' # Match section substring
r')' # End section number match
r')' # End number pattern match
r'([.]?)[ ]*$' # Lazy matching end of strings
)
But do I need to prefix each string with r to make sure that the whole thing is processed as a raw string when implicit line joining is utilized?
From this page:
re.X
re.VERBOSE
This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.
That means that the two following regular expression objects that match a decimal number are functionally equal:
a = re.compile(r"""\d + # the integral part
\. # the decimal point
\d * # some fractional digits""", re.X)
b = re.compile(r"\d+\.\d*")
As you can see, it is possible to use a triple-quoted string with the 'r' prefix, as seen above.