Search code examples
javascriptsanitizationstrip

Strip JavaScript from HTML DOM Tree with JavaScript


How can one possibly sanitize a HTML DOM Tree from all JavaScript occurrences, meaning: on(click|mouseover|etc), href:javascript..., <script> and all other possible variants of (inline) JavaScript, while using JavaScript?

For example: I want users to upload their HTML file, copy the contents in the <body> tags and insert it into one of my pages. I don't want to allow JavaScript in their HTML files. I could use <iframe sandbox>, but I wondered whether there is another way.


Solution

  • The following uses the Element.attributes collection to remove inline on handlers and attributes containing the word, "javascript" – without affecting other DOM attributes.

    function disableInlineJS() {
      var obj = document.querySelectorAll('*');
    
      for (var i = 0; i < obj.length; i++) {
        for (var j in obj[i].attributes) {
          var attr = obj[i].attributes[j];
          if ((attr.name && attr.name.indexOf('on') === 0) ||
              (attr.value && attr.value.toLowerCase().indexOf('javascript') > -1)
             ) {
            attr.value= '';
          }
        }
      }
    }
    <button onclick="disableInlineJS()">Disable inline JavaScript</button><hr>
    
    <div onmouseover="this.style.background= 'yellow';" ONmouseout="this.style.background= '';" style="font-size:25px; cursor: pointer;">
      Hover me
      <br>
      <a href="javAsCriPT:alert('gotcha')" style="font-weight:bold">Click me!</a>
      <br>
      <a href="http://example.com">Example.com!</a>
    </div>
    <button onclick="alert('gotcha')">Me, me!</button>

    I don't think there's a way to remove script elements before they've had a chance to run.