I'm attempting to scrape an .aspx
site, which is a essentially just a big paginated table along the lines of the one found here: http://data.fingal.ie/ViewDataSets/ (Note, the actual site I'm scraping is behind a paywall, so can't post the actual link ).
However, the problem is that rather than each page of the table having a unique url, the table changes pages by posting to itself, and then updating the content inside of the table.
The next page
button looks like this:
</td>
<td class="dxpButton" onclick="aspxGVPagerOnClick('ctl00_cphProduct_gvList','PBN');" style="cursor:pointer;">
<img class="dxWeb_pNext" src="/DXR.axd?r=1_5-BUdv6" alt="Next" /></td><td style="width:4px;"><div style="height:1px;width:4px;overflow:hidden;">
How would I simulate a click on this button using HtmlUnit
?
You would want to find the <div class="dxpButton">
.
The easiest way of doing this would be using xPath:
final WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://<<YOUR URL HERE>>");
final HtmlDivision div = page.getFirstByXPath("//div[@class='dpxButton']");
page = div.click();
// This returns the page shown after the click
This will perform the click. I assume that it is loaded through AJAX, in which case you may want to use:
while(some new element doesn't exist; or some 'completed' condition) {
// Wait for javascript to catch up.
webClient. waitForBackgroundJavaScript(1000);
}