How can one possibly sanitize a HTML DOM Tree from all JavaScript occurrences, meaning: on(click|mouseover|etc)
, href:javascript...
, <script>
and all other possible variants of (inline) JavaScript, while using JavaScript?
For example: I want users to upload their HTML file, copy the contents in the <body>
tags and insert it into one of my pages. I don't want to allow JavaScript in their HTML files. I could use <iframe sandbox>
, but I wondered whether there is another way.
The following uses the Element.attributes collection to remove inline on
handlers and attributes containing the word, "javascript" – without affecting other DOM attributes.
function disableInlineJS() {
var obj = document.querySelectorAll('*');
for (var i = 0; i < obj.length; i++) {
for (var j in obj[i].attributes) {
var attr = obj[i].attributes[j];
if ((attr.name && attr.name.indexOf('on') === 0) ||
(attr.value && attr.value.toLowerCase().indexOf('javascript') > -1)
) {
attr.value= '';
}
}
}
}
<button onclick="disableInlineJS()">Disable inline JavaScript</button><hr>
<div onmouseover="this.style.background= 'yellow';" ONmouseout="this.style.background= '';" style="font-size:25px; cursor: pointer;">
Hover me
<br>
<a href="javAsCriPT:alert('gotcha')" style="font-weight:bold">Click me!</a>
<br>
<a href="http://example.com">Example.com!</a>
</div>
<button onclick="alert('gotcha')">Me, me!</button>
I don't think there's a way to remove script
elements before they've had a chance to run.