Search code examples
javadynamichtmlunit

HtmlUnit - Dynamic Content not found


I am not having any luck with dynamic content being returned in the HtmlPage object when loading this page: https://www.fangraphs.com/leaders/splits-leaderboards?splitArr=5&strgroup=season&statgroup=1&startDate=2018-03-01&endDate=2018-11-01&filter=IP%7Cgt%7C0&position=P&statType=player&autoPt=true&players=&pg=0&pageItems=30&sort=22,1&splitArrPitch=&splitTeams=false

The "react-drop-test" div is empty. I am trying to find the anchor with the "Export Data" text so I can click it and get the content as a stream.

Any thoughts on what I can do to get the HtmlPage to contain the dynamic content?

Here is a sample of what I have right now. The anchors never return any elements.

    webClient = new WebClient(BrowserVersion.CHROME);
    webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
    webClient.getCookieManager().setCookiesEnabled(false);
    webClient.getOptions().setThrowExceptionOnScriptError(false);
    webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    webClient.setJavaScriptTimeout(jsTimeout);
    updateJSErrorListener(webClient);

    int thisYear = year;
    if (isEarlySeason()) {
        thisYear = year - 1;
    }
    String leftyURL = "https://www.fangraphs.com/leaderssplits.aspx?splitArr=5&strgroup=season&statgroup=1&startDate=" + thisYear + "-03-01&endDate=" + year + "-11-01&filter=IP%7Cgt%7C0&position=P&statType=player&autoPt=true&players=&pg=0&pageItems=30&sort=22,1";

    HtmlPage page = webClient.getPage(leftyURL);

    HtmlAnchor leftyAnchor = null;
    HtmlDivision div = (HtmlDivision) page.getElementById("react-drop-test");
    List<HtmlElement> anchors = div.getElementsByTagName("a");
    for (DomElement anchor:anchors2) {
        if ((anchor.getAttribute("class").contains("data-export"))) {
            leftyAnchor = (HtmlAnchor) anchor;
            break;
        }
    }

    Page p = leftyAnchor.click();
    InputStream is = p.getWebResponse().getContentAsStream();
    List<List<String>> leftyCSV = readCSVFile(is);

Solution

  • And another web page filled with strange js. So let me start with some general hints:

    • do not change the default configuration if not needed (or if you do not know what effect this will have)
    • because your page (or at least parts) are rendered by javscript you have to wait at some place

    And finally: you need a newer version of HtmlUnit to get the job done because the javascript impl misses one feature to get the javascript code used by this page working.

    To get the new (SNAPSHOT) version you have these options:

    With the latest code base this will do the job for you:

    String url = "https://www.fangraphs.com/leaders/splits-leaderboards?splitArr=5&strgroup=season&statgroup=1&startDate=2018-03-01&endDate=2018-11-01&filter=IP%7Cgt%7C0&position=P&statType=player&autoPt=true&players=&pg=0&pageItems=30&sort=22,1&splitArrPitch=&splitTeams=false";
    
    try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_60)) {                                                                                                                                                                                                       
        webClient.getOptions().setThrowExceptionOnScriptError(false);                                                                                                                                                                                                                  
    
        HtmlPage page = webClient.getPage(url);                                                                                                                                                                                                                                        
        webClient.waitForBackgroundJavaScript(50000);                                                                                                                                                                                                                                  
        System.out.println("----------------");                                                                                                                                                                                                                                        
        System.out.println(page.asText());                                                                                                                                                                                                                                             
    
        HtmlDivision div = (HtmlDivision) page.getElementById("react-drop-test");                                                                                                                                                                                                      
        List<HtmlElement> anchors = div.getElementsByTagName("a");                                                                                                                                                                                                                     
        for (DomElement anchor:anchors) {                                                                                                                                                                                                                                              
            if ((anchor.getAttribute("class").contains("data-export"))) {                                                                                                                                                                                                              
    
                HtmlAnchor leftyAnchor = (HtmlAnchor) anchor;                                                                                                                                                                                                                          
    
                Page p = leftyAnchor.click();                                                                                                                                                                                                                                          
                System.out.println();                                                                                                                                                                                                                                                  
                System.out.println("----------------");                                                                                                                                                                                                                                
                System.out.println(p.getWebResponse().getContentAsString());                                                                                                                                                                                                           
    
                break;                                                                                                                                                                                                                                                                 
            }                                                                                                                                                                                                                                                                          
        }                                                                                                                                                                                                                                                                              
    }