Search code examples
c#xpathhtml-agility-packselectsinglenode

Parsing through innerHTML with HtmlAgilityPack


Just trying to figure out how to parse information from already parsed information.

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//div [@class=\"result-link\"]"))
{
    if (node == null)
        Console.WriteLine("debug");
    else
    {
        //string h_url = node.Attributes["a"].Value;
        Console.WriteLine(node.InnerHtml);
    }
}

So you can kind fo see what I am trying to do with the 'string h_url' declaration. Within the "result-link" div class there's an a href attribute that I am trying to grab the href value. So the link basically.

Can't seem to figure it out. I have tried using the Attributes array:

string h_url = node.Attributes["//a[@href].Value;

With no luck.


Solution

  • You can use XPath to select elements relative to the current node:

    HtmlNodeCollection nodes = doc.DocumentNode.SelectNodes("//div[@class='result-link']");
    if (nodes != null)
    {
        foreach (HtmlNode node in nodes)
        {
            HtmlNode a = node.SelectSingleNode("a[@href]");
            if (a != null)
            {
                // use  a.Attributes["href"];
            }
    
            // etc...
        }
    }