Search code examples
javaweb-scrapingwebclienthtmlunit

Java HTMLUnit WebClient ScriptException errors


I am using HTMLUnit for scrape website. I am using htmlunit-2.19 version. I know this is duplicate question but believe me i tried all solutions that i found in google but still i am getting this exceptions. Please see below exception

com.gargoylesoftware.htmlunit.ScriptException: ReferenceError: "jQuery" is not defined. (URL/lib/dropdown/core.js#3)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:954) [htmlunit-2.19.jar:2.19]
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:628) [htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513) [htmlunit-core-js-2.17.jar:na]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:836) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:812) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:997) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:399) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:277) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:293) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:799) [htmlunit-2.19.jar:2.19]
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source) [xercesImpl-2.11.0.jar:na]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:756) [htmlunit-2.19.jar:2.19]
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206) [nekohtml-1.9.22.jar:na]
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330) [nekohtml-1.9.22.jar:na]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499) [nekohtml-1.9.22.jar:1.9.22]
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452) [nekohtml-1.9.22.jar:1.9.22]
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) [xercesImpl-2.11.0.jar:na]
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1039) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:252) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:198) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:271) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:159) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:478) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:352) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:417) [htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:402) [htmlunit-2.19.jar:2.19]
    at com.company.dashboard.service.impl.ReverseServiceImpl.loginToAds(ReverseServiceImpl.java:447) [classes/:na]
    at com.company.dashboard.service.impl.ReverseServiceImpl.loginToAds(ReverseServiceImpl.java:462) [classes/:na]
    at com.company.dashboard.service.impl.ReverseServiceImpl.getKeyword(ReverseServiceImpl.java:502) [classes/:na]
    at com.company.dashboard.service.impl.ReverseServiceImpl.handleReverseBySetting(ReverseServiceImpl.java:879) [classes/:na]
    at com.company.dashboard.thread.ConCurrentRunnable.run(ConCurrentRunnable.java:44) [classes/:na]
    at com.company.dashboard.thread.CustomThreadPool$WorkerThread.run(CustomThreadPool.java:53) [classes/:na]
Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: ReferenceError: "jQuery" is not defined. (URL/lib/dropdown/core.js#3)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3935) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3919) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.notFoundError(ScriptRuntime.java:3996) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.name(ScriptRuntime.java:1846) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1627) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411) [htmlunit-core-js-2.17.jar:na]
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309) ~[htmlunit-2.19.jar:2.19]
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3286) ~[htmlunit-core-js-2.17.jar:na]
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115) ~[htmlunit-core-js-2.17.jar:na]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:827) ~[htmlunit-2.19.jar:2.19]
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:939) [htmlunit-2.19.jar:2.19]
    ... 36 common frames omitted

2019-07-13 11:06:01.078  INFO 5686 --- [       Thread-4] c.g.h.javascript.JavaScriptEngine        : Caught script exception

I have researched on google and I found many solutions about this exception and I have tried that all solutions but none of solutions are work.

Please see below solutions that i have applied

Solutions 1

WebClient webClient= new WebClient(BrowserVersion.FIREFOX_38);

        webClient.setIncorrectnessListener(new IncorrectnessListener() {

            @Override
            public void notify(String message, Object origin) {
                // TODO Auto-generated method stub

            }

        });
        webClient.setCssErrorHandler(new SilentCssErrorHandler() {

        });
        webClient.setJavaScriptErrorListener(new JavaScriptErrorListener() {

            @Override
            public void scriptException(InteractivePage page,
                    ScriptException scriptException) {
                // TODO Auto-generated method stub

            }

            @Override
            public void timeoutError(InteractivePage page, long allowedTime,
                    long executionTime) {
                // TODO Auto-generated method stub

            }

            @Override
            public void malformedScriptURL(InteractivePage page, String url,
                    MalformedURLException malformedURLException) {
                // TODO Auto-generated method stub

            }

            @Override
            public void loadScriptError(InteractivePage page, URL scriptUrl,
                    Exception exception) {
                // TODO Auto-generated method stub

            }

        });
        webClient.setHTMLParserListener(new HTMLParserListener() {

            @Override
            public void error(String message, URL url, String html, int line,
                    int column, String key) {
                // TODO Auto-generated method stub

            }

            @Override
            public void warning(String message, URL url, String html, int line,
                    int column, String key) {
                // TODO Auto-generated method stub

            }

        });

Solution 2 :

   webClient.getOptions().setCssEnabled(false);
   webClient.getOptions().setJavaScriptEnabled(true);
   webClient.getOptions().setThrowExceptionOnScriptError(false);            
   webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);      
   webClient.getOptions().setPrintContentOnFailingStatusCode(false);

I found other solutions that setJavaScriptEnabled(false) but i need to enable JS. Without enable JS i am not able to scrape site. So i must set enable JS.

Please let me know is there missing in my code ?


Solution

  • Without knowing the page and more details about you your code i can only try to give some advice

    • you HtmlUnit version is really outdated (2.19 is from Nov 12, 2015) and we are now at 2.35.0. Please use the latest one....
    • check the browser log from real browsers to see if the error is there also
    • webClient.getOptions().setThrowExceptionOnScriptError(false); changes the behavior of HtmlUnit to not throw an exception if a unhandled js exception is detected. This is more or less the same way of handling js exceptions as real browsers do. But (comparable to real browsers) HtmlUnit still logs this exceptions. If you don't like to get informed about this problem you have to configure the logger.