Search code examples
regexhtml-parsing

Regex match (replace) all occurrences of double quotes in words between span tags


I'm trying to replace all occurrences of " between two span tags.

I use:

(?<=<span>[a-zA-Z0-9_æøåÆØÅ_,.;:!#€%&\/()$§'])*(\")(?=[a-zA-Z0-9_æøåÆØÅ_,.;:!#€%&\/()$§']*<\/span>)

Lookbehind for letters+specialChars

find "

Lookahead for letters+specialChars

But with the html string

<span>d"s"s"</span>

It only matches the last occurrence of the "

How can I match (eventually replace) all occurrences of double quotes within the tag?

Thanks in advance.


Solution

  • Don't bother the the look behind. Instead, match " where </span> follows without finding <span> earlier than </span>, ie " is inside a span open/close pair:

    "(?=((?!<span>).)*<\/span>)
    

    See live demo.

    Breaking down the regex:

    • " a literal quote
    • (?!<span>). any character except the < of <span>
    • ((?!<span>).)* any characters up to, but not including, the < of <span>
    • (?=((?!<span>).)*<\/span>) followed by input that encounters </span> before <span>