I'm trying to crawl a website using htmlunit. Whenever I run it though it only outputs the following error:
Caused by: net.sourceforge.htmlunit.corejs.javascript.EcmaError: TypeError: Cannot read property "push" from undefined (https://www.kinoheld.de/dist/prod/0.4.7/widget.js#1)
Now I don't know much about JS, but I read that push
is some kind of array operation. This seems standard to me and I don't know why it would not be supported by htmlunit.
Here is the code I'm using so far:
public static void main(String[] args) throws IOException {
WebClient web = new WebClient(BrowserVersion.FIREFOX_45);
String url = "https://www.kinoheld.de/kino-muenchen/royal-filmpalast/vorstellung/280823/?mode=widget&showID=280828#panel-seats";
HtmlPage response = web.getPage(url);
What am I missing? Is there a way around this or a way to fix this? Thanks in advance!
I've encountered a similar problem before. This is an issue with HTML Unit being designed as a test harness framework rather than a web scraping one. Are you running the latest version of HTML Unit?
I was able to run your code by adding both the setThrowExceptionOnScriptError(false)
(as mentioned in Coffee Converter's answer) line as well as adding
at the top of the method to disable the log dump. This yielded an output of:
Royal Filmpalast München München | kinoheld.de
Full code is as follows:
public static void main(String[] args) throws IOException {
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_45);
String url = "https://www.kinoheld.de/kino-muenchen/royal-filmpalast/vorstellung/280823/?mode=widget&showID=280828#panel-seats";
HtmlPage response = webClient.getPage(url);
This was run on RedHat command line with HTML Unit 2.2.1. Hope this helps.