Search code examples
javahtmlunit

Using HtmlUnit to pre-render a Javascript website (HTML Snapshot)


I'm trying to build a prerenderer powered by HtmlUnit, and tried to test it with this url: https://demo.tutorialzine.com/2009/09/simple-ajax-website-jquery/demo.html#page3

Here's my code:

final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED);
WebClientOptions options = webClient.getOptions();
options.setCssEnabled(true);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
//    webClient.setAjaxController(new AjaxController(){
//        @Override
//        public boolean processSynchron(HtmlPage page, WebRequest request, boolean async) {
//            return true;
//        }
//    });
options.setThrowExceptionOnScriptError(false);
options.setThrowExceptionOnFailingStatusCode(false);
options.setRedirectEnabled(false);
options.setAppletEnabled(false);
options.setJavaScriptEnabled(true);
//options.setUseInsecureSSL(true);
options.setTimeout(50000);
webClient.addRequestHeader("Access-Control-Allow-Origin", "*");

HtmlPage page = webClient.getPage(path);

// important!  Give the headless browser enough time to execute JavaScript
// The exact time to wait may depend on your application.
webClient.setJavaScriptTimeout(10000);
webClient.waitForBackgroundJavaScript(10000);
//just wait
for (int i = 0; i < 20; i++) {
    synchronized (page) {
        page.wait(500);
    }
}
String xml = page.asXml();

The problem here is that the output html does not include the content that should have been fetched with Javascript.

What could be wrong here?


Solution

  • Well, the below code retrieves with 2.28-snapshot:

    Donec in massa vel lectus aliquam laoreet nec et turpis. ....

    try (final WebClient webClient = new WebClient(BrowserVersion.BEST_SUPPORTED)) {
        WebClientOptions options = webClient.getOptions();
        options.setCssEnabled(true);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
        options.setTimeout(50000);
        webClient.addRequestHeader("Access-Control-Allow-Origin", "*");
    
        HtmlPage page = webClient.getPage("https://demo.tutorialzine.com/2009/09/simple-ajax-website-jquery/demo.html#page3");
    
        // important!  Give the headless browser enough time to execute JavaScript
        // The exact time to wait may depend on your application.
        webClient.setJavaScriptTimeout(10000);
        webClient.waitForBackgroundJavaScript(10000);
        //just wait
        Thread.sleep(10000);
    
        String xml = page.asXml();
        System.out.println(xml);
    }
    

    What else are you missing?