Search code examples
regexpcresubstitution

Regex substitution: Replace texts, not codes


I'm trying to solve a quiz of regex for days but still can't get it right. I'm getting so close but still can't get it to pass.

Task:

In an HTML page, replace the text micro with &micro;. Oh, and don't screw up the code: don't replace inside <the tags> or &entities;

Replace

  • micro -> &micro;
  • abc micro -> abc &micro;
  • micromicro -> &micro;&micro;
  • &micro;micro -> &micro;&micro;

Don't touch

  • <tag micro /> -> <tag micro />
  • &micro; -> &micro;
  • &abcmicro123; -> &abcmicro123;

I tried this but it fails on the last &micro;, what did I miss? Can someone point out what did I miss? Thanks in advance!

What I have tried:

Regex

((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro

Substitution

$1&micro;

Solution

  • You can try something like this:

    (?:<.*?>|&\w++;)(*SKIP)(*F)|micro

    replacement string:

    &micro;