Search code examples
c#htmlnodeshtml-agility-pack

I need to get specific values from a node with HtmlAgilePack


I need to extract some data from a page, of which the HTML is poorly named. The html looks something like the following:

<div class="container-entry">
    <h1 class="entry-heading">Aarakocra</h1>
    <div class="entry-metadata">
        <h2 class="entry-metadata-label">Armor Class: </h2>
        <h2 class="entry-metadata-label">12</h2>
    </div><div class="entry-metadata">
        <h2 class="entry-metadata-label">hit Points: </h2>
        <h2 class="entry-metalabel-content">13 (3d8)</h2></div>

In this example, I am trying to get the values "12" and "13 (3d8)"

So far I've tried this:

HtmlAgilityPack.HtmlWeb website = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument pageMonsterStats = website.Load(websiteUrl + "/" + monsterName);
var monsterNode = pageMonsterStats.DocumentNode.SelectSingleNode("//div[@class='container-entry']");
Console.WriteLine(monster.Descendants("div").Where(node => node.Equals("Armor Class: ")).ToString());

I expected to get the index of the element which contains "Armor Class: ", which I would then use to get the value ("12") from the same element, but this returns "System.Linq.Enumerable+WhereEnumerableIterator`1[HtmlAgilityPack.HtmlNode]"


Solution

  • That is because Where does return an IEnumerable. Try First, Last or concat your output into a string.

    Console.WriteLine(monster.Descendants("div").First(node => node.Equals("Armor Class: ")).ToString());
    

    In your case you may want to do something like this:

    using System;
    using System.Linq;
    
    public class Program
    {
        public static void Main()
        {
            const string html = @"
    <div class=""container-entry"">
      <h1 class=""entry-heading"">Aarakocra</h1>
      <div class=""entry-metadata"">
        <h2 class=""entry-metadata-label"">Armor Class: </h2>
        <h2 class=""entry-metadata-label"">12</h2>
      </div>
      <div class=""entry-metadata"">
        <h2 class=""entry-metadata-label"">hit Points: </h2>
        <h2 class=""entry-metalabel-content"">13 (3d8)</h2>
      </div>
    </div>
    ";
        var doc = new HtmlAgilityPack.HtmlDocument();
        doc.LoadHtml(html);
        var monsterNode = doc.DocumentNode.SelectSingleNode("//div[@class='container-entry']");
        var data = monsterNode.Descendants("div").Select(x => x.Descendants("h2")).SelectMany(x => x).Select(x => x.InnerText).ToArray();
        var armorClass = data[1];
        var hitPoints = data[3]; // if you want
    
        Console.WriteLine(armorClass); // output 12
        }
    }
    

    Demo