Search code examples
regexgrepreplacetextwrangler

Regex to match part of string, when match does not contain a specific string - PCRE grep


I'm using TextWrangler grep to perform find/replace on multiple files and have run into a wall with the last find/replace I need to perform. I need to match any text between "> and the first instance of a <br /> in a line but the match cannot contain the character sequence [xcol]. The regex flavor is Perl-Compatible (PCRE) so lookbehind needs to be fixed-length.

Example Text to Search:

<p class="x03">FooBar<br />Bar</p>
<p class="x03">FooBar [xcol]<br />Bar</p>
<p class="x06">Hello World<br />[xcol]foo[xcol]bar<br /></p>
<p class="x07">Hello World[xcol]<br />[xcol]foo[xcol]bar<br /></p>  

Desired behavior of regex:
1st Line match ">FooBar<br />
2nd Line no match
3rd Line match ">Hello World<br />
4th Line no match

The text between "> and the <br /> will be captured in a group to be used with the replace function. The closest I got was using the following regex with negative lookahead, but this will not match the 3rd line as desired:

">((?!.*?\[xcol]).*?)<br />

Any help or advice is appreciated. Thank you.


Solution

  • Try this regex:

    ">((?!\[xcol]).)*<br\s*/>
    

    A (short) explanation:

    ">               # match '">'
    (                # start group 1
      (?!\[xcol]).   #   if '[xcol]' can't be seen ahead, match any character (except line breaks)
    )                # end group 1
    *                # repeat group 1 zero or more times
    <br\s*/>         # match '<br />'
    

    If you need to match line breaks for . as well, either enable DOT-ALL (add (?s) before the .) or replace the . with something like [\s\S]