I'm trying to download some images from a website and they are stored in a table under div elements. I'm using Java with HtmlUnit library and this is what I have so far:
_page = (HtmlPage) linkToPicsPage.click();
List<HtmlElement> _divList = _page.getElementsByIdAndOrName("imgcontainer");
int num = 0;
for (HtmlElement el : _divList) {
InputStream is = el.click().getWebResponse().getContentAsStream();
File path = new File(_downloadPath+_car.getRegNumber());
if (!path.exists())
path.mkdir();
writeToFile(is,new File(_downloadPath+_car.getRegNumber()+System.getProperty("file.separator")+_car.getRegNumber()+"["+num+"].jpg"));
num++;
}
The website code looks like this:
<table id="ctl00_ContentPlaceContenido_GridImagenes" cellspacing="0" border="0" style="border-collapse:collapse;">
<tr>
<td>
<div id="imgcontainer">
<div class="imgitem">
<a href="descarga.aspx?IDOWNER=40312&ID=598477&Action=View">
<img alt="Foto Frente Izquierda" border="0" src="imgthumb.aspx?IDOWNER=40312&ID=598477&Action=View"/>
</a>
<br />
Foto Frente Izquierda
</div>
</div>
</td><td>
But what I'm dowloading is some HTML code instead of the images themselves. I don't know how can I get the href attribute from the HtmlDivision elements that I get in "_divList". Any suggestions?
Thanks
Edit1:
This is the current code that I'm using to download them, the problem with this code is that I'm downloading some elements that I don't need (i'm downloading everything that has "descarga.aspx" in the href). That's why I want to be more specific and only download the images. As you can see, the HtmlAnchors that I get by searching for "descarga.aspx" are not redirecting me to another page:
List<HtmlAnchor> picsLinks = new LinkedList<HtmlAnchor>();
picsLinks = _page.getAnchors();
int num = 0;
for (HtmlAnchor currentPic : picsLinks) {
if (currentPic.getHrefAttribute().contains("descarga.aspx")) {
InputStream is = currentPic.click().getWebResponse().getContentAsStream();
File path = new File(_downloadPath+_car.getRegNumber());
if (!path.exists())
path.mkdir();
writeToFile(is,new File(_downloadPath+_car.getRegNumber()+System.getProperty("file.separator")+_car.getRegNumber()+"["+num+"].jpg"));
_log.append("....Downloaded picture "+regNumber+num+".jpg\n");
num++;
}
_log.setCaretPosition(_log.getDocument().getLength());
}
I can't say without seeing the whole site, but I suspect it's something to do with clicking on the "imgcontainer" , which contains more than the image. What happens when you manually click on the words "Foto Frente Izquierda" in a browser?
Try clicking on the image directly, using getByXPath and something like "//div[@class='imgitem']/a" (off the top of my head) instead of getElementsByIdAndOrName.