Search code examples
javascripthtml-parsing

How to strip HTML tags from string in JavaScript?


How can I strip the HTML from a string in JavaScript?


Solution

  • Using the browser's parser is the probably the best bet in current browsers. The following will work, with the following caveats:

    • Your HTML is valid within a <div> element. HTML contained within <body> or <html> or <head> tags is not valid within a <div> and may therefore not be parsed correctly.
    • textContent (the DOM standard property) and innerText (non-standard) properties are not identical. For example, textContent will include text within a <script> element while innerText will not (in most browsers). This only affects IE <=8, which is the only major browser not to support textContent.
    • The HTML does not contain <script> elements.
    • The HTML is not null
    • The HTML comes from a trusted source. Using this with arbitrary HTML allows arbitrary untrusted JavaScript to be executed. This example is from a comment by Mike Samuel on the duplicate question: <img onerror='alert(\"could run arbitrary JS here\")' src=bogus>

    Code:

    var html = "<p>Some HTML</p>";
    var div = document.createElement("div");
    div.innerHTML = html;
    var text = div.textContent || div.innerText || "";