Search code examples
c#parsingnodeshtml-agility-pack

How to get parameter between the nodes in HtmlAgilityPack?


<tr class="td_center_color">
<td colspan="2" style="padding-bottom:10px;" class="span_str">
    <p>
        <span>Страниц: </span>260<br>
        <span>Страниц: </span>743<br>
    </p>
</td>
</tr>

How to receive the numbers through HtmlAgilityPack? I can only get the first number, but all are needed:

var startNode = document.DocumentNode.SelectSingleNode("//span[6]");
        var endNode = document.DocumentNode.SelectSingleNode("//span[7]");
        int startNodeIndex = startNode.ParentNode.ChildNodes.IndexOf(startNode);
        int endNodeIndex = endNode.ParentNode.ChildNodes.IndexOf(endNode);
        var nodes = startNode.ParentNode.ChildNodes.Where((n, index) => index > startNodeIndex && index < endNodeIndex).Select(n => n);
        if (nodes != null)
        {
            foreach (var htmlNode in nodes)
            {
                richTextBox1.AppendText(htmlNode.InnerText);
            }
        }

Solution

  • This should give you an IEnumerable<string> containing values ["260", "743"].

    IEnumerable<string> values = document
        .DocumentNode
        .SelectSingleNode("//td[@class='span_str']")          // Select 'td' having class 'span_str'
        .SelectNodes("//span//following-sibling::text()[1]")  // Select the first occurring text after each 'span' within 'td'
        .Select(node => node.InnerText);                      // Select the 'InnerText' value from the text nodes