Search code examples
c#html-agility-pack

Splitting HTML string with HtmlAgilityPack


I have a html code like this:

<div class="classA">
Content
</div>
<div class="classA">
Content
</div>
 // another ClassA ....

 <div class="classA">
 <blockquote>Some key</blockquote >
 </div>

How can I remove outerHTML of Some key or get all html code above class which had Some key with Html agility pack?

It's mean, the result I want is

 <div class="classA">
Content
</div>
<div class="classA">
Content
</div>
 // another ClassA ....

Solution

  • XPATH is your friend.

    This returns expected result with just one query

    HtmlDocument doc = new HtmlDocument();
    doc.LoadHtml(html);
    foreach (HtmlNode node in doc.DocumentNode.SelectNodes(
       "//blockquote[text()='Some key']/parent::*/preceding::*"))
          Console.WriteLine(node.OuterHtml);
    

    where

    • //blockquote[text()='Some key'] selects element with the required key. If it should be within <div class="classA">, use more precise path expression of //div[@class='classA']/blockquote[text()='Some key']
    • parent selects parent element, which is <div class="classA">
    • preceding selects all nodes before the given node

    Demo: https://dotnetfiddle.net/BlQ3w9