Search code examples
regexmariadbmediawikipcre

Regex to match a string if not followed by another string


In Mediawiki via Replace extension (MariaDB 10.6) I want to match the string <span class="sense"><span class="bld">A</span> and delete it, as long as there is no <span class="bld"> further down that line. Here is an example of text where it should not be matched:

<span class="sense"><span class="bld">A</span> [[lay bare at the side]], [[expose]], τι τῆς πλευρᾶς <span class="bibl">Arr. <span class="title">Tact.</span>40.5</span>, cf. <span class="bibl">D.C.49.6</span> (Pass.). </span><span class="sense"><span class="bld">2</span> metaph., [[lay bare]], [[disclose]], τὸν πάντα λόγον <span class="bibl">Hdt.1.126</span>, cf. <span class="bibl">8.19</span>, <span class="bibl">9.44</span>; τὸ βούλευμα <span class="bibl">Conon 50</span>:—Pass., <b class="b3">παρεγυμνώθη διότι</b>… <span class="bibl">Plb.1.80.9</span>.</span>

So far I tried (<span class="sense"><span class="bld">A<\/span>) ((?!<span class="bld">).*) (and replacing with nothing) but it matches instances that do contain the unwanted string.


Solution

  • You can use

    <span class="sense"><span class="bld">A<\/span>(?s)(?!.*<span class="bld">)
    

    See the regex demo. Details:

    • <span class="sense"><span class="bld">A<\/span> - a literal <span class="sense"><span class="bld">A</span> string
    • (?s) - s flag that makes . match across lines
    • (?!.*<span class="bld">) - a negative lookahead that fails the match if, immediately to the right of the current location, there are
      • .* - any zero or more chars as many as possible
      • <span class="bld"> - a literal string.