Search code examples
pythonregexbackreference

Can't get Python regex backreferences to work


I want to match the docstrings of a Python file. Eg.

r""" Hello this is Foo
     """

Using only """ should be enough for the start.

>>> data = 'r""" Hello this is Foo\n     """'
>>> def display(m):
...     if not m:
...             return None
...     else:
...             return '<Match: %r, groups=%r>' % (m.group(), m.groups())
...
>>> import re
>>> print display(re.match('r?"""(.*?)"""', data, re.S))
<Match: 'r""" Hello this is Foo\n     """', groups=(' Hello this is Foo\n     ',)>
>>> print display(re.match('r?(""")(.*?)\1', data, re.S))
None

Can someone please explain to me why the first expression matches and the other does not?


Solution

  • You are using the escape sequence \1 instead of the backreference \1.

    You can fix this by changing to escaping the \ before 1.

    print display(re.match('r?(""")(.*?)\\1', data, re.S))
    

    You can also fix it by using a raw string for your regex, with no escape sequences.

    print display(re.match(r'r?(""")(.*?)\1', data, re.S))