Search code examples
c#xpathhrefhtml-agility-pack

Getting the value of the 'href' inside of a div in HTMLAgilityPack in C#


I am trying to grab the value of a "href". The code is something like this:

          <div class="s_newsbox" style="font-size:12px; vertical-align:middle; overflow: hidden; float:left; margin:10px; margin-bottom:15px; height: 270px; width:280px; border-radius:6px; position:relative; text-align:center; padding:0px">
            <div style="background-color:#292929; background-color:rgba(0,0,0,0.8); padding:5px; padding-left:2px; padding-right:10px; width:100%; position:absolute; top:0; left:0;"><b>Samsung nx30 + zoom kit 18/55</b>
            </div>
            <a href="vendo.php?t=1395911">
              <img style="width:100%; height:100%" src="http://img1.juzaphoto.com/shared_files/uploads_mercatino/sell_1395911_small.jpg" alt="">
              <br></a>
            <div style="line-height:150%; background-color:#292929; background-color:rgba(0,0,0,0.8); padding:5px; position:absolute; bottom:0; left:0; margin-left:auto; width:100%; text-align:left">Venditore: 
              <a href="me.php?l=it&amp;p=45923"><b>Pierobob</b></a>  
              <br> Prezzo: <b>350 &euro;</b>  
              <br> Zona: <b>Bologna</b>  
              <br> 
              <a href="vendo.php?t=1395911">Leggi annuncio</a> (8 visite)
              <br>
            </div>
          </div>

What I am trying to do is this:

           var list = page.DocumentNode.SelectNodes("//div[@class='s_newsbox']");
           foreach (var obj in list)
            {
              var url = obj.SelectSingleNode(".//a").Attributes["href"].Value;

I want to grab the value 'vendo.php?t=1395911' but instead I get the href value of another line, which doesn't have a parent div with the class 's_newsbox'

What I am doing wrong?

Thanks you!


Solution

  • You can filter down the objects in question with more accurate xpath as long as you don't need any of the other nodes inside the s_newsbox div.

           var list = page.DocumentNode.SelectNodes("//div[@class='s_newsbox']/a[string-length(@href)>0]");
           foreach (var obj in list)
            {
              var url = obj.SelectSingleNode(".").Attributes["href"].Value;