The div
s of the HTML page I am targeting look like this:
<div class="white-row1">
<div class="results">
<div class="profile">
<a href="hrefThatIWant.com" class>
<img src = "http://imgsource.jpg" border="0" width="150" height="150 alt>
</a>
</div>
</div>
</div>
<div class="white-row2">
// same content as the div above
</div>
I want to scrap collect the href in each div in a list.
This is my current code:
List<HtmlAnchor> profileDivLinks = (List)htmlPage.getByXPath("//div[@class='profile']//@href");
for(HtmlAnchor link:profileDivLinks)
{
System.out.println(link.getHrefAttribute());
}
This is the error I am receiving (which goes on first line of the for statement):
Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.html.DomAttr cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlAnchor
What do you think the issue is?
The issue is you're getting an attribute and then you're casting that attribute to an anchor. I guess the solution with the minimal change to your code would be just modifying the XPath to return an anchor:
htmlPage.getByXPath("//div[@class='profile']//a");