I am trying to use HtmlUnit to scrape scores off the BBC Sports website http://www.bbc.co.uk/sport/football/live-scores
The page loads on Premier League, then there is a dropdown to select other leagues and then click the 'Update' button to update the page (presumably via ajax).
This code works fine to get the updated scores:
long startTime = System.currentTimeMillis();
String titleBar = getTitleBar(page);
HtmlOption option = ukGroupDropdown.getOptionByValue(competition);
ukGroupDropdown.setSelectedAttribute(option, true);
HtmlButton updateButton = (HtmlButton)page.getElementById("filter-nav-submit");
Thread.sleep(1000); // WHY???????
HtmlPage newPage = updateButton.click();
while(titleBar.equals(getTitleBar(newPage))) {
Thread.sleep(100);
}
System.out.println("Took " + (System.currentTimeMillis() - startTime));
return getMatches(newPage);
But if I take out the Thread.sleep 'before' clicking on the update button, the 'newPage' is never updated. Why could this be? And is there a more robust way (like the titleBar loop that just gets the text from the title bar eg "Barclays Premier League" etc).
Maybe the line:
ukGroupDropdown.setSelectedAttribute(option, true);
Is performing an asynchronous (AJAX) call and the
updateButton.click();
line needs to wait for the former to finish.
For example, the button could be disabled but when selecting an item it might get enabled.