Search code examples
c#.nettagshtml-agility-pack

Why does HTMLAgilityPack remove my closing tag?


Using htmlagilitypack in .NET (C#) and have some html code as such:

<p><ol><li>A bunch of text</li></ol><em>some em text</em> more text here.</p>

I then load it into a doc and save it via LoadHtml and Save functions. But I end up with:

<p><ol><li>A bunch of text</li></ol><em>some em text</em> more text here.

The last closing p tag is gone.

Why is this happening? How to fix it?


Solution

  • As others said in the comments, it's an invalid HTML so that might be the reason why the HtmlDocument class itself is removing </p> in the end when you store it into a file using the Save method, but as a workaround, you can store it using System.IO.File class and store the document.Text at the output location.

    var html = "<p><ol><li>A bunch of text</li></ol><em>some em text</em> more text here.</p>";
    var document = new HtmlDocument();
    document.LoadHtml(html);
    File.WriteAllText("insert_your_path_here", document.Text);