Search code examples
javascripthtmlregexnewlinepre

Javascript remove \n or \t in html content except within pre tags


In Javascript, how to remove line break (\n or \t) in html string except within <pre> tags.

I use this code to remove line break:

htmlString.replace(/[\n\t]+/g,"");

However, it also removes \n\t in <pre> tag. How to fix it?


Solution

  • You can start first by matching the text that need to be cleaned, which can only be:

    • Text from the begining of the string to the next opening <pre> tag.
    • Text from a closing </pre> tag to the next opening <pre> tag.
    • Text from a closing </pre> tag to the end of the string.
    • Text from the begining of the string to the end of the string (no pre elements in the string).

    which can be described in regex as:

    (?:^|<\/pre>)[^]*?(?:<pre>|$)/g
    

    where [^] matches anything including new lines, and *? is a non-greedy quantifier to match as few times as possible.


    Next, we get the matched text that need to be cleaned, so we clean it using the regex /[\n\t]+/g.


    Example:

    var htmlString = "<body>\n\t<p>\n\t\tLorem\tEpsum\n\t</p>\n\t<pre>\n\t\tHello, World!\n\t</pre>\n\n\t<pre>\n\t\tThis\n\t\tis\n\t\tawesome\n\t</pre>\n\n\n</body>";
    
    var preview = document.getElementById("preview");
    preview.textContent = htmlString;
    
    document.getElementById("remove").onclick = function() {
        preview.textContent = htmlString.replace(/(?:^|<\/pre>)[^]*?(?:<pre>|$)/g, function(m) {
            return m.replace(/[\n\t]+/g, "");
        });
    }
    pre {
        background: #fffbec;
    }
    <button id="remove">Remove</button>
    The pre bellow is just used to show the string, it is not THE PRE.
    <pre id="preview"></pre>


    Regex101 Example.