Search code examples
c#xpathweb-scrapinghtml-agility-pack

HTMLAgilityPack - Get element in class by class


I wish to get the value from the H2 (highlighted) element within 'listicle-page' class shown below. Currently the code gets all values in the DIV element while I need to just get the value of H2 that is contained within the class below.

Consider the following HTML:

Click here to see HTML

Please see code below -

private void getFact()
        {
            HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
            HtmlAgilityPack.HtmlDocument doc = web.Load("https://www.rd.com/culture/interesting-facts/");

            var headerNames = doc.DocumentNode.SelectNodes("//div[@class='listicle-page']").ToList();

            foreach(var item in headerNames)
            {
                MessageBox.Show(item.InnerText);
            }
        }

Solution

  • Your XPath //div[@class='listicle-page'] matches div node with all of its descendants. If you need to select child h2 node only, then explicitly specify it by adding /h2:

    //div[@class='listicle-page']/h2