Search code examples
pythonregexbackslashlookbehind

How to use '\' in Python lookbehind assertion regex (?<=\\) to match C++-like quoted strings


How can I match r'\a' in Python using lookbehind assertion?
Actually, I need to match C++ strings like "a \" b" and

"str begin \
end"

I tried:

>>> res = re.compile('(?<=\)a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.7/re.py", line 244, in _compile
    raise error, v # invalid expression

>>> res = re.compile('(?<=\\)a')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 190, in compile
    return _compile(pattern, flags)
  File "/usr/lib/python2.7/re.py", line 244, in _compile
    raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis

>>> res = re.compile('(?<=\\\)a')
>>> ms = res.match(r'\a')
>>> ms is None
True

Real Example:
When I'm parcing "my s\"tr"; 5; like ms = res.match(r'"my s\"tr"; 5;'), the expected output is: "my s\"tr"

Answer
Finally stribizhev provided the solution. I thought my initial regex is less computationally expensive and the only issue was that it should be declared using a raw string:

>>> res = re.compile(r'"([^\n"]|(?<=\\)["\n])*"', re.UNICODE)
>>> ms = res.match(r'"my s\"tr"; 5;')
>>> print ms.group()
"my s\"tr"

Solution

  • EDIT: The final regex is an adaptation from the regex provided at Word Aligned

    I think you are looking for this regex:

    (?s)"(?:[^"\\]|\\.)*"
    

    See demo on regex101.

    Sample Python code (tested on TutorialsPoint):

    import re
    p = re.compile(ur'(?s)"(?:[^"\\]|\\.)*"')
    ms = p.match('"my s\\"tr"; 5;')
    print ms.group(0)