I have the following code:
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://www.myland.co.il/%D7%9E%D7%97%D7%A9%D7%91-%D7%94%D7%A9%D7%A7%D7%99%D7%94");
The code fails with com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 404 Not Found for http://www.myland.co.il/Scripts/swfobject_modified.js
I do see in the console output the HTML page I am interested in. Is there a way to supress the exception and get an Html page after all? The page does load correctly in a real browser.
Yes, you can use setThrowExceptionOnFailingStatusCode to ignore failing status codes, something like;
WebClient webClient = new WebClient();
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = webClient.getPage("http://www.myland.co.il/%D7%9E%D7%97%D7%A9%D7%91-%D7%94%D7%A9%D7%A7%D7%99%D7%94");
The default is normally true, which gives the error you're describing.
EDIT: Just in case you're running an old version, with versions of HtmlUnit earlier than 2.11, setThrowExceptionOnFailingStatusCode
can be called on the WebClient itself instead of the options returned by getOptions()
. In 2.11 or later, you should use getOptions()
as above.