I'm using the HtmlAgilitypack to extract some data from the following website:
<div class="pull-right">
<ul class="list-inline">
<li class="social">
<a target="_blank" href="https://www.facebook.com/wsat.a?ref=ts&fref=ts" class="">
<i class="icon fa fa-facebook" aria-hidden="true"></i>
</a>
</li>
<li class="social">
<a target="_blank" href="https://twitter.com/wsat_News" class="">
<i class="icon fa fa-twitter" aria-hidden="true"></i>
</a>
</li>
<li>
<a href="/user" class="hide">
<i class=" icon fa fa-user" aria-hidden="true"></i>
</a>
</li>
<li>
<a onclick="ga('send', 'event', 'PDF', 'Download', '');" href="https://wsat.com/pdf/issue15170/index.html" target="_blank" class="">
PDF
<i class="icon fa fa-file-pdf-o" aria-hidden="true"></i>
</a>
</li>
I've managed to write this code to extract the first link in the html script which is https://www.facebook.com/wsat. However, all I want is to extract the link with the pdf which is https://wsat.com/pdf/issue15170/index.html but without any luck. How do I specify which link to extract ?
var url = "https://wsat.com/";
var HttpClient = new HttpClient();
var html = await HttpClient.GetStringAsync(url);
var htmlDocument = new HtmlDocument();
htmlDocument.LoadHtml(html);
var links = htmlDocument.DocumentNode.Descendants("div").Where(node => node.GetAttributeValue("class", "").Equals("pull-right")).ToList();
var alink = links.First().Descendants("a").FirstOrDefault().ChildAttributes("href")?.FirstOrDefault().Value;
await Launcher.OpenAsync(alink);
Use an xpath expression as a selector:
var alink = htmlDocument.DocumentNode
.SelectSingleNode("//li/a[contains(@onclick, 'PDF')]")
.GetAttributeValue("href", "");
Explanation of xpath (as requested):
Match li
tag at any depth in the document with an immediate child a
tag, which has an attribute onclick
that contains the string 'PDF'
.