Consider the following HTML fragment (_
is used for whitespace):
<head>
...
<link ... ___/>
<!-- ... -->
...
</head>
I'm using Html Agility Pack (HAP) to read HTML files/fragments and to strip out links. What I want to do is find the LINK
(and some other) elements and then replace them with whitespace, like so:
<head>
...
____________
<!-- ... -->
...
</head>
The parsing part seems to be working so far, I get the nodes I'm looking for. However, HAP tries to fix the HTML content while I need everything to be exactly the same, except for the changes I'm trying to make. Plus, HAP seems to have quite a few bugs when it comes to writing back content that was read in previously, so the approach I want to take is let HAP parse the input and then I go back to the original input and replace content that I don't want.
The problem is, HtmlNode
doesn't seem to have an input length property. It has StreamPosition
which seems to indicate where reading of the node's content started within the input but I couldn't find a length property that'd tell me how many characters were consumed to build the node.
I tried using the OuterHtml
propety but, unfortunately, HAP tries to fix the LINK
by removing the ___/
part (a LINK
element is not supposed to be closed). Because of this, OuterHtml.Length
returns the wrong length.
Is there a way in HAP to get this information?
I ended up modifying the code of HtmlAgilityPack to expose a new property that returns the private _outerlength
field of HtmlNode
.
public virtual int OuterLength
{
get
{
return ( _outerlength );
}
}
This seems to be working fine so far.