Search code examples
htmlxpathweb-scrapinghrefhtmlunit

HtmlUnit - Unable to get anchors from div


The divs of the HTML page I am targeting look like this:

<div class="white-row1">
  <div class="results">
    <div class="profile">
      <a href="hrefThatIWant.com" class>
        <img src = "http://imgsource.jpg" border="0" width="150" height="150 alt>
      </a>
    </div>
   </div>
</div>
<div class="white-row2">
// same content as the div above
</div>

I want to scrap collect the href in each div in a list.

This is my current code:

List<HtmlAnchor> profileDivLinks = (List)htmlPage.getByXPath("//div[@class='profile']//@href"); 
for(HtmlAnchor link:profileDivLinks)
{
    System.out.println(link.getHrefAttribute());
}

This is the error I am receiving (which goes on first line of the for statement):

Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.html.DomAttr cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlAnchor 

What do you think the issue is?


Solution

  • The issue is you're getting an attribute and then you're casting that attribute to an anchor. I guess the solution with the minimal change to your code would be just modifying the XPath to return an anchor:

    htmlPage.getByXPath("//div[@class='profile']//a");