Search code examples
javaweb-crawlerhtmlunit

htmlunit Cannot read property "push" from undefined


I'm trying to crawl a website using htmlunit. Whenever I run it though it only outputs the following error:

Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "push" from undefined (https://www.kinoheld.de/dist/prod/0.4.7/widget.js#1)

Now I don't know much about JS, but I read that push is some kind of array operation. This seems standard to me and I don't know why it would not be supported by htmlunit.

Here is the code I'm using so far:

public static void main(String[] args) throws IOException {
    WebClient web = new WebClient(BrowserVersion.FIREFOX_45);
    web.getOptions().setUseInsecureSSL(true);
    String url = "https://www.kinoheld.de/kino-muenchen/royal-filmpalast/vorstellung/280823/?mode=widget&showID=280828#panel-seats";
    web.getOptions().setThrowExceptionOnFailingStatusCode(false);
    web.waitForBackgroundJavaScript(9000);
    HtmlPage response = web.getPage(url);

    System.out.println(response.getTitleText());
}

What am I missing? Is there a way around this or a way to fix this? Thanks in advance!


Solution

  • I've encountered a similar problem before. This is an issue with HTML Unit being designed as a test harness framework rather than a web scraping one. Are you running the latest version of HTML Unit?

    I was able to run your code by adding both the setThrowExceptionOnScriptError(false) (as mentioned in Coffee Converter's answer) line as well as adding java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); at the top of the method to disable the log dump. This yielded an output of:

    Royal Filmpalast München München | kinoheld.de
    

    Full code is as follows:

    public static void main(String[] args) throws IOException {
    
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF);
    
        WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
        String url = "https://www.kinoheld.de/kino-muenchen/royal-filmpalast/vorstellung/280823/?mode=widget&showID=280828#panel-seats";
    
        webClient.getOptions().setUseInsecureSSL(true);
        webClient.getOptions().setThrowExceptionOnScriptError(false);
        webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
        webClient.waitForBackgroundJavaScript(9000);
        HtmlPage response = webClient.getPage(url);
    
        System.out.println(response.getTitleText());
    }
    

    This was run on RedHat command line with HTML Unit 2.2.1. Hope this helps.