Search code examples

How to extract specific link in c#?

I'm using the HtmlAgilitypack to extract some data from the following website:

 <div class="pull-right">
          <ul class="list-inline">
            <li class="social">
              <a target="_blank" href=";fref=ts" class="">
                <i class="icon fa fa-facebook" aria-hidden="true"></i>
            <li class="social">
              <a target="_blank" href="" class="">
                <i class="icon fa fa-twitter" aria-hidden="true"></i>
                <a href="/user" class="hide">
                <i class=" icon fa fa-user" aria-hidden="true"></i>
              <a onclick="ga('send', 'event', 'PDF', 'Download', '');" href="" target="_blank" class="">

                <i class="icon fa fa-file-pdf-o" aria-hidden="true"></i>

I've managed to write this code to extract the first link in the html script which is However, all I want is to extract the link with the pdf which is but without any luck. How do I specify which link to extract ?

        var url = "";
        var HttpClient = new HttpClient();
        var html = await HttpClient.GetStringAsync(url);
        var htmlDocument = new HtmlDocument();

        var links = htmlDocument.DocumentNode.Descendants("div").Where(node => node.GetAttributeValue("class", "").Equals("pull-right")).ToList();

        var alink = links.First().Descendants("a").FirstOrDefault().ChildAttributes("href")?.FirstOrDefault().Value;

        await Launcher.OpenAsync(alink);


  • Use an xpath expression as a selector:

    var alink = htmlDocument.DocumentNode
        .SelectSingleNode("//li/a[contains(@onclick, 'PDF')]")
        .GetAttributeValue("href", "");

    Explanation of xpath (as requested):

    Match li tag at any depth in the document with an immediate child a tag, which has an attribute onclick that contains the string 'PDF'.