Search code examples
html-agility-pack

how to get value of a tag that has no class or id in html agility pack?


I am trying to get the text value of this a tag:

<a href="item?id=22513425">67&nbsp;comments</a>

so i'm trying to get '67' from this. however there are no defining classes or id's.

i've managed to get this far:

        IEnumerable<HtmlNode> commentsNode = htmlDoc.DocumentNode.Descendants(0).Where(n => n.HasClass("subtext"));

        var storyComments = commentsNode.Select(n =>
            n.SelectSingleNode("//a[3]")).ToList();

this only give me "comments" annoyingly enough.

I can't use the href id as there are many of these items, so i cant hardcord the href

how can i extract the number aswell?


Solution

  • Just use the @href attribute and a dedicated string function :

    substring-before(//a[@href="item?id=22513425"],"comments")
    

    returns 67.

    EDIT : Since you can't hardcode all the content of @href, maybe you can use starts-with. XPath 1.0 solution.

    Shortest form (+ text has to contain "comments") :

    substring-before(//a[starts-with(@href,"item?") and text()[contains(.,"comments")]],"c")
    

    More restrictive (+ text has to finish with "comments") :

    substring-before(//a[starts-with(@href,"item?")][substring(//a, string-length(//a) - string-length('comments')+1) = 'comments'],"c")