I've made a java server that scrapes a website, but my problem is that after a few requests (about 10 or so) I always get this error ElementNotFoundException
, although the element should be there. Basically my program just checks every few minutes this website for info but after a few times it just gives me that exception.
This is my code for scraping, I don't know what's wrong with it that after a few times the element is not found..
final WebClient webClient = new WebClient();
try (final WebClient webClient1 = new WebClient()) {
final HtmlPage page = webClient.getPage("http://b7rabin.iscool.co.il/מערכתשעות/tabid/217/language/he-IL/Default.aspx");
WebResponse webResponse = page.getWebResponse();
String content = webResponse.getContentAsString();
// System.out.println(content);
HtmlSelect select = (HtmlSelect) page.getElementById("dnn_ctr914_TimeTableView_ClassesList");
HtmlOption option = select.getOptionByValue("" + userClass);
select.setSelectedAttribute(option, true);
//String jscmnd = "javascript:__doPostBack('dnn$ctr914$TimeTableView$btnChangesTable','')";
String jscmnd = "__doPostBack('dnn$ctr914$TimeTableView$btnChanges','')";
ScriptResult result = page.executeJavaScript(jscmnd);
HtmlPage page1 = (HtmlPage) result.getNewPage();
String content1 = page1.getWebResponse().getContentAsString();
//System.out.println(content1);
System.out.println("-----");
HtmlDivision getChanges = null;
String changes = "";
getChanges = page1.getHtmlElementById("dnn_ctr914_TimeTableView_PlaceHolder");
changes = getChanges.asText();
changes = changes.replaceAll("\n", "").replaceAll("\r", "");
System.out.println(changes);
}
The exception:
Exception in thread "Thread-0" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[*] attributeName=[id] attributeValue=[dnn_ctr914_TimeTableView_PlaceHolder]
at com.gargoylesoftware.htmlunit.html.HtmlPage.getHtmlElementById(HtmlPage.java:1552)
at scrapper$1.run(scrapper.java:108)
I am really desperate to solve it, it's the only bottleneck in my project.
You just need to wait a little before manipulating the second page, as hinted here.
So, sleep() for 3 seconds would make it always succeeds.
HtmlPage page1 = (HtmlPage) result.getNewPage();
Thread.sleep(3_000); // sleep for 3 seconds
String content1 = page1.getWebResponse().getContentAsString();
Also, you don't need to instantiate two instances of WebClient
.