Search code examples
pythonregexpcre

match everything except three consecutive double quotes


I'm looking for a regex to parse everything but three consecutive double quotes. The problem is that when I use a normal negative lookahead the consecutive double quotes get gobbled and there it doesn't really match what I want.

Let's assume I have the following text:

Lorem Ipsum
"""
sdsdfgsdf
"""
bar

And want to linewise-regex to match the first, third and fifth row, but not the """.

I've tried the following regex: /(?!""").*/, but that's when the double quotes get gobbled. Trying to match one double quote at a time using ["] fails too: /(?!["]["]["]).*/

I'm using Python to match the regex.

Any ideas how I can make this regex work?


Solution

  • The unanchored pattern (?!""").* will match any char 0+ times if what is on the right is not """. Since it is not anchored, it will match after the first " in """ because at that position the assertion will succeed.

    You have to use an anchor ^ to assert the start of the string and add .* to the negative lookahead if those 3 double quotes can not occur in the string:

    ^(?!.*""").*$
    

    Or only use the 3 consecutive quotes if those are the only chars in the string.

    ^(?!"""$).*$
    

    Regex demo