Search code examples
regexpcre

regex get multiline tag having a word and not another


I have to replace all the <img> tags containing a text (dog) but not containing another text (cat), for a multiline text

So having this text:

<img black 
dog>
<img dog white cat>
<img black dog>
<img cat and dog>
<img red fox>
<img black dog>

The following texts should be found:

enter image description here

There is a lot of ways to find it for single line regex using ^ and $, but I am not being able to do it with multiline.

My first attempt was using the single line option (/s) this way:

/<img ((?!cat).)*?(dog)>/gs

But it select the tag before the last dog (red fox) because is not greedy enough.

enter image description here

And then I made it greedy (adding a ?) with no /s option, using \s\S:

/<img ((?!cat)[\s\S.])*?(dog)?>/g

And I get the fifth tag found again (<img red fox>) even when there is no dog.

enter image description here

How can I get my 3 dogs selected with no cats or foxes?

Link to my attempt in regex101: https://regex101.com/r/AGgb4z/1


Solution

  • You could match <img, then assert that there is no cat using a negative lookahead (?![^<>]*cat)

    Use a negated character class [^<>]* matching any char except < and > on the left and the right of dog.

    You could use word boundaries for example \bcat\b if cat and dog should not be part of a longer word.

    <img (?![^<>]*cat)[^<>]*dog[^<>]*>
    

    Regex demo