Search code examples
pythonregexstringmultilinerawstring

What's required for implicit line joins on raw strings in Python


I'd like to split a regex onto multiple lines for clarity, but I'm not sure what the best way to do this is with raw strings.

SECT_EXP = (
    r'^(?P<number>.+?[.]? {1,2}'  # Begin number pattern match
    r'(?P<sect_num>'  # Begin section number match
    r'(?P<full_num>'  # Begin full number match
    r'(?P<title>\d{1,2}?)'  # Match title substring
    r'(?P<chapter>\d{2})'  # Match chapter substring
    r')'  # End full number match
    r'[.]'
    r'(?P<section>\d+)'  # Match section substring
    r')'  # End section number match
    r')'  # End number pattern match
    r'([.]?)[ ]*$'  # Lazy matching end of strings
)

But do I need to prefix each string with r to make sure that the whole thing is processed as a raw string when implicit line joining is utilized?


Solution

  • From this page:

    re.X
    re.VERBOSE
    

    This flag allows you to write regular expressions that look nicer. Whitespace within the pattern is ignored, except when in a character class or preceded by an unescaped backslash, and, when a line contains a '#' neither in a character class or preceded by an unescaped backslash, all characters from the leftmost such '#' through the end of the line are ignored.

    That means that the two following regular expression objects that match a decimal number are functionally equal:

    a = re.compile(r"""\d +  # the integral part
                       \.    # the decimal point
                       \d *  # some fractional digits""", re.X)
    
    b = re.compile(r"\d+\.\d*")
    

    As you can see, it is possible to use a triple-quoted string with the 'r' prefix, as seen above.