Search code examples
c#htmlhtml-agility-pack

How to remove blank lines from HTML with HTMLAgilityPack?


I have a HTML document that contains lots of needless blank lines which I'd like to remove. Here's a sample of the HTML:

<html>

<head>


</head>

<body>

<h1>Heading</h1>

<p>Testing

I've tried the following code but it removed every newline, I just want to remove the ones that are blank lines.

static string RemoveLineReturns(string html)
    {
        html = html.Replace(Environment.NewLine, "");
        return html;
    }

Any idea how to do this with HTMLAgilityPack? Thanks, J.


Solution

  • I don't think that HTMLAgilityPack currently features a native solution for that.

    For such scenarios I use the following Regex:

    html = Regex.Replace(html, @"( |\t|\r?\n)\1+", "$1");
    

    This preserves whitespaces and line endings correctly, while condensing multiple tabs, newlines and whitespaces into one.