Search code examples
javahtmlunit

Scraping an aspx site with HtmlUnit, how does one click on a Javascript button?


I'm attempting to scrape an .aspx site, which is a essentially just a big paginated table along the lines of the one found here: http://data.fingal.ie/ViewDataSets/ (Note, the actual site I'm scraping is behind a paywall, so can't post the actual link ).

However, the problem is that rather than each page of the table having a unique url, the table changes pages by posting to itself, and then updating the content inside of the table.

The next page button looks like this:

</td>
<td class="dxpButton" onclick="aspxGVPagerOnClick('ctl00_cphProduct_gvList','PBN');" style="cursor:pointer;">
<img class="dxWeb_pNext" src="/DXR.axd?r=1_5-BUdv6" alt="Next" /></td><td style="width:4px;"><div style="height:1px;width:4px;overflow:hidden;">

How would I simulate a click on this button using HtmlUnit?


Solution

  • You would want to find the <div class="dxpButton">. The easiest way of doing this would be using xPath:

    final WebClient webClient = new WebClient();
    HtmlPage page = webClient.getPage("http://<<YOUR URL HERE>>");
    
    final HtmlDivision div = page.getFirstByXPath("//div[@class='dpxButton']");
    page = div.click(); 
    // This returns the page shown after the click
    

    This will perform the click. I assume that it is loaded through AJAX, in which case you may want to use:

    while(some new element doesn't exist; or some 'completed' condition) {
        // Wait for javascript to catch up.
        webClient. waitForBackgroundJavaScript(1000);
    }