Search code examples
javascripthtmltagsentityinnerhtml

How to remove html tag from string but keep html entities intact


I would like to remove the HTML tags (element) from a string without touching the html entities such as &nbps; & é < etc..

For now I am using this :

stringWithTag = "<i> I want to keep my ->&nbsp;<- element space, but remove the tags <b>Please Help</b></i>";
    var div = document.createElement('div');
    div.innerHTML = stringWithTag;
    
    console.log("INPUT with html entity &nbsp;");
    console.log(stringWithTag);

    htmlNoTag = div.textContent || div.innerText || "";
    console.log("\nOUTPUT that should still have entity &nbsp;, but not...");
    console.log(htmlNoTag);

cf jsfiddle : https://jsfiddle.net/az4st8LL/

But I always miss the element entity (in that exemple &nbsp should still be visible but it is not the case). I would like to avoid using a regex to remove all html tags if possible.

Does anyone has a solution to this ?

Thanks,


Solution

  • Since you want to avoid using regex, try

    function stripTags(html) {
      var result = "";
      var add = true, c;
      for (var i = 0; i < html.length; i++) {
        c = html[i];
        if (c == '<') add = false;
        else if (c == '>') add = true;
        else if (add) result += c;
      }
      return result;
    };
    

    This will not work for <i>I want to keep my ->&nbsp;<- element space, but remove the tags <b>Please Help</b></i> since you used < and > but will turn <i> I want to keep my &nbsp; element space, but remove the tags <b>Please Help</b></i> into I want to keep my &nbsp; element space, but remove the tags Please Help