When I parse an HTML5 document such as:
<p>Content</p>
using HtmlAgilityPack with default options, it parses it successfully, but the constructed HtmlDocument
does not include the <html>
and <body>
elements that the standard HTML5 parsing algorithm would construct.
Are there options I am missing that would do this?
Or is there some other library (.NET 6) that I should be using instead?
I have come to the conclusion that unless the functionality is very well hidden, HtmlAgilityPack does not offer this capability.
I discovered the package AngleSharp, which seems to meet my requirement.
Well, almost. Parsing <p>Content</p>
, I get
<?xml version="1.0" encoding="UTF-8"?>
<HTML xmlns="http://www.w3.org/1999/xhtml"><HEAD/>
<BODY><P>Content</P></BODY></HTML>
I need to do a bit of further work to get the element names in lower case, but we're close.