Search code examples
javascriptregexrecursive-regex

Better Way to write this to escape HTML content


I have string which rich text content

something like this for example

<p>Hello</p>

<br/>

<p> Christian </p>

<pre> Don't Know what to do </pre>

Now I want dont want script to be present in the above content and if present esape it

so If I have content which look like this

<p>Hello</p>

<br/>

<p> Christian </p>
<script type="text/javascript"> alert("Hello")</script>
<pre> Don't Know what to do </pre>

Need to be replace with

<p>Hello</p>

<br/>

<p> Christian </p>
&lt;script type="text/javascript"&gt; alert("Hello")&lt;/script&gt;
<pre> Don't Know what to do </pre>

I have currently developed regex for it

so my code look something like this

if content.match(/<script(.+?)>/) {
  content = content.replace(content.match(/<script(.+?)>/)[0],content.match(/<script(.+?)>/)[0].replace("<","&lt;").replace(">","&gt;"))
}
if content.match(/<\script\s*>/)
 {
content = content.replace(content.match(/<\/script\s*>/)[0],content.match(/<\/script\s*>/)[0].replace("<","&lt;").replace(">","&gt;"))
}

so the result content will have script tag escaped

Can anyone suggest me cleaner way to achieve this?


Solution

  • Cleaner:

    content = content.replace(/<(script[^>]*|\/script)>/g, '&lt;$1&gt;');
    

    However, this is probably not the way to go about this. Why are these <script> tags in the JS string in the first place?