Search code examples
c#html-parsinghtml-agility-pack

How to get the value from a specific cell C# Html-Agility-Pack


How do I get a value from a specific location in the second table in the document. I need the value from the second cell down and third column over in the html document below. How do I do this.

<html>
<head>
<title>Tables</title>
</head>
<body>
<table border="1">
  <tr>
    <th>Room</th>
    <th>Location</th>
  </tr>
  <tr>
    <td>Paint</td>
    <td>A4</td>
  </tr>
  <tr>
    <td>Stock</td>
    <td>B3</td>
  </tr>
  <tr>
    <td>Assy</td>
    <td>N9</td>
  </tr>
</table>
<p></p>
<table border="1">
  <tr>
    <th>Product</th>
    <th>Mat'l</th>
    <th>Weight</th>
    <th>Size</th>
  </tr>
  <tr>
    <td>Cover</td>
    <td>Plastic</td>
    <td>4</td>
    <td>16</td>
  </tr>
  <tr>
    <td>Retainer</td>
    <td>Steel</td>
    <td>12</td>
    <td>8</td>
  </tr>
  <tr>
    <td>Pin</td>
    <td>Bronze</td>
    <td>18</td>
    <td>7</td>
  </tr>
</table>
<p></p>
<table border="1">
  <tr>
    <th>Process</th>
    <th>Location</th>
    <th>Number</th>
  </tr>
  <tr>
    <td>Trim</td>
    <td>S2</td>
    <td>8</td>
  </tr>
  <tr>
    <td>Finish</td>
    <td>D2</td>
    <td>3</td>
  </tr>
</table>
</body>
</html>

Thanks!

Also... Please help a newbie out!!! Please direct me to a resource that can help me understand the syntax of Html-Agility-Pack (HAP). I have the CHM file for HAP - I've tried to use it and I've tried to use VS's object browser for HAP, but it's too cryptic for me at this point.


Solution

  • Html Agility Pack is equipped with an XPATH evaluator that follows .NET XPATH syntax over the parsed HTML nodes. Note the XPATH expression used with this library require elements and attribute names to be lowercase, independently from the original HTML source.

    So in your case, you can get the cell for the 3rd column, 2nd row, 2nd table with an expression like this:

    HtmlDocument doc = new HtmlDocument();
    doc.Load(YouTestHtmlFilePath);
    
    HtmlNode node = doc.DocumentNode.SelectSingleNode("//table[2]/tr[2]/td[3]");
    Console.WriteLine(node.InnerText); // will output "4"
    

    //table means get any TABLE element recursively from root. [2] means take the 2nd table.

    /tr means get any TR element from this current table. [2] means take the 2nd row.

    /td means get any TD element from this current row. [3] means take the 3nd cell.

    You can find good XPATH tutorials here: XPath Tutorial