Search code examples
javajavascriptwebweb-scrapinghtmlunit

Error while using HtmlUnit


When I execute this simple code to get the contents of a website as text, it shows errors which I can't understand.

import java.io.IOException;
import java.net.MalformedURLException;

import com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException;
import com.gargoylesoftware.htmlunit.ScriptException;
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class sd {
    public static void main(String[] args) {
        sd vip=new sd();
        try {
            vip.homePage();
        } catch (Exception e) {
            e.printStackTrace();
        }

        System.out.print("sssss");
    }

    public void homePage() throws Exception, ScriptException {
        final WebClient webClient = new WebClient();
        final HtmlPage page =       
    (HtmlPage)webClient.getPage("http://timesofindia.indiatimes.com/");
        String pageAsText = page.asText();
        String pageAsXML = page.asXml();

        // System.out.println(pageAsXML);
        System.out.println("////////////////////output//////////////////////////"); 
        System.out.println(pageAsText);
        // System.out.println(pageAsXML);
        System.out.println("////////////////////output ends//////////////////////////"); 
    }

}

Error that I get:

   ======= EXCEPTION START ========
Exception class=[com.gargoylesoftware.htmlunit.ScriptException]
com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)
Caused by: java.lang.RuntimeException: Exception invoking jsxFunction_write
Caused by: com.gargoylesoftware.htmlunit.ScriptException: Exception invoking jsxFunction_write
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:595)

Solution

  • set your webClient to not throw javascript exceptions

    webClient.setThrowExceptionOnScriptError(false);

    If not enougth, set FF as client behavior when initializing your webclient.

    webClient = new WebClient(BrowserVersion.FIREFOX_3_6); webClient = new WebClient(BrowserVersion.FIREFOX_10); // depending on HtmlUnit version