I am trying to get the text value of this a tag:
<a href="item?id=22513425">67 comments</a>
so i'm trying to get '67' from this. however there are no defining classes or id's.
i've managed to get this far:
IEnumerable<HtmlNode> commentsNode = htmlDoc.DocumentNode.Descendants(0).Where(n => n.HasClass("subtext"));
var storyComments = commentsNode.Select(n =>
n.SelectSingleNode("//a[3]")).ToList();
this only give me "comments" annoyingly enough.
I can't use the href id as there are many of these items, so i cant hardcord the href
how can i extract the number aswell?
Just use the @href attribute and a dedicated string function :
substring-before(//a[@href="item?id=22513425"],"comments")
returns 67.
EDIT : Since you can't hardcode all the content of @href, maybe you can use starts-with. XPath 1.0 solution.
Shortest form (+ text has to contain "comments") :
substring-before(//a[starts-with(@href,"item?") and text()[contains(.,"comments")]],"c")
More restrictive (+ text has to finish with "comments") :
substring-before(//a[starts-with(@href,"item?")][substring(//a, string-length(//a) - string-length('comments')+1) = 'comments'],"c")