Search code examples
pythonregexmultiline

Python regex only matches in single-line mode not multi-line mode


Why are there no regex matches when this is multiline, but it works on one line?

Python 3.8.6 | packaged by conda-forge | (default, Dec 26 2020, 05:05:16) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.20.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import re

In [2]: msg = r"""
   ...: (\(1054, "Unknown column 'inf(e0)?' in 'field list'"\))
   ...: |
   ...: (ProgrammingError: inf can not be used with MySQL)
   ...: """

In [3]: err_text = 'ProgrammingError: inf can not be used with MySQL'

In [4]: re.search(msg, err_text, re.MULTILINE | re.VERBOSE)

But if I don't break it up into multiple lines and omit the re.MULTILINE | re.VERBOSE, it works

In [5]: msg2 = r"""(\(1054, "Unknown column 'inf(e0)?' in 'field list'"\))|(ProgrammingError: inf can not be used with MySQL)"""

In [6]: re.search(msg2, err_text)
Out[6]: <re.Match object; span=(0, 48), match='ProgrammingError: inf can not be used with MySQL'>

I've been trying to figure it out here https://regex101.com/r/tkju6f/1 but no luck.

(for this PR)


Solution

  • This is because the newlines are considered literally and not ignored. Try instead using comments:

    msg = r'''(?#
    )(\(1054, "Unknown column 'inf(e0)?' in 'field list'"\))(?#
    )|(?#
    )(ProgrammingError: inf can not be used with MySQL)(?#
    )'''
    

    The parts between (?# and ) will be ignored.

    Multiline mode is not what you think: it just means that ^ (resp. $) is not meant to match the beginning (resp. ending) of the string, but the beginning (resp. ending) of the line.

    Full execution:

    >>> import re
    >>> msg = r'''(?#
    ... )(\(1054, "Unknown column 'inf(e0)?' in 'field list'"\))(?#
    ... )|(?#
    ... )(ProgrammingError: inf can not be used with MySQL)(?#
    ... )'''
    >>> err_text = 'ProgrammingError: inf can not be used with MySQL'
    >>> print(re.search(msg, err_text))
    <re.Match object; span=(0, 48), match='ProgrammingError: inf can not be used with MySQL'>
    

    Here you can find the fixed version of your regex101.


    EDIT: If you don't want to modify the regular expression but just make it more readable, just break the python lines like this:

    msg = r'''(\(1054, "Unknown column 'inf(e0)?' in 'field list'"\))''' + \
          r'''|''' + \
          r'''(ProgrammingError: inf can not be used with MySQL)'''