Search code examples
regexregex-lookaroundsnegative-lookbehindnedit

RegEx: Grabbing values between not escaped quotation marks


This question is related to RegEx: Grabbing values between quotation marks

The RegEx from the best answer

(["'])(?:(?=(\\?))\2.)*?\1

tested with the

Debuggex Demo

also matches strings that start with an escaped double quote. I tried to extend the definition to work with a negativ lookbehind.

(["'](?<!\\))(?:(?=(\\?))\2.)*?\1

Debuggex Demo

but this does not change anything in the matched pattern. Any suggestions on how to exclude escaped singe / double quotes as a starting pattern?

I want to use this as a highlighting pattern in nedit, which supports regex-lookbehind.

example for desired matching:

<p>
  <span style="color: #ff0000">"str1"</span> notstr
  <span style="color: #ff0000">"str2"</span>
  \"notstr <span style="color: #ff0000">"str4"</span>
</p>


Solution

  • Using negative lookbehind for the backslash not preceded by another backslash, i.e.

    (?<!(?<!\\)\\)["']
    

    solves the problem:

    ((?<!(?<!\\)\\)["'])(?:(?=(\\?))\2.)*?(?<!(?<!\\)\\)\1
    

    Demo.

    You should be very careful about this approach, because generally regex is not a good tool for parsing inputs in markup syntax. You would be better off using a full-scale parser, and then optionally applying regex to parts that you get back from it.