Search code examples
regexpython-re

Why Python re.escape() escapes "#" character?


Reading the re.escape() documentation, it says

Changed in version 3.3: The '_' character is no longer escaped.

Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped. As a result, '!', '"', '%', "'", ',', '/', ':', ';', '<', '=', '>', '@', and "`" are no longer escaped.

Question is, why # character is still escaped?


Solution

  • When using re.X / re.VERBOSE option, the # char becomes special (as does any literal whitespace).

    Check the code snippet below:

    import re
    pattern = "# Something"
    text = "Here is # Something"
    print( re.search(pattern, text ) )
    # => <re.Match object; span=(8, 19), match='# Something'>
    print( re.search(pattern, text, re.X ) )
    # => <re.Match object; span=(0, 0), match=''>
    

    See the Python demo.

    When using re.search(pattern, text, re.X ) there is no match because # Something is parsed as a comment, the # marks the single line comment start, all text after it till the line break is ignored in the pattern.

    So, re.escape escapes #, then it is treated as a literal char when re.X / re.VERBOSE is used:

    print( re.search(re.escape(pattern), text, re.X ) )
    # => <re.Match object; span=(8, 19), match='# Something'>