Search code examples
html-agility-pack

HtmlAgilityPack - SelectSingleNode for descendants


I found that HtmlAgilityPack SelectSingleNode always starts from the first node of the original DOM. Is there an equivalent method to set its starting node ?

Sample html

<html>
  <body>
    <a href="https://home.com">Home</a>
    <div id="contentDiv">
    <tr class="blueRow">
        <td scope="row"><a href="https://iwantthis.com">target</a></td>
    </tr>
    </div>
  </body>
</html>

Not working code

//Expected:iwantthis.com  Actual:home.com, 
string url = contentDiv.SelectSingleNode("//tr[@class='blueRow']")
                       .SelectSingleNode("//a") //What should this be ?
                       .GetAttributeValue("href", "");

I have to replace the code above with this:

    var tds = contentDiv.SelectSingleNode("//tr[@class='blueRow']").Descendants("td");
    string url = "";
    foreach (HtmlNode td in tds)
    {
        if (td.Descendants("a").Any())
        {
            url= td.ChildNodes.First().GetAttributeValue("href", "");
        }
    }

I am using HtmlAgilityPack 1.7.4 on .Net Framework 4.6.2


Solution

  • The XPath you are using always starts at the root of the document. SelectSingleNode("//a") means start at the root of the document and find the first a anywhere in the document; that's why it grabs the Home link.

    If you want to start from the current node, you should use the . selector. SelectSingleNode(".//a") would mean find the first a that is anywhere beneath the current node.

    So your code would look like this:

    string url = contentDiv.SelectSingleNode(".//tr[@class='blueRow']")
                       .SelectSingleNode(".//a")
                       .GetAttributeValue("href", "");