Search code examples
javahtmlweb-scrapinghtmlunit

WebClient(htmlunit) doesn't see some elements


I am trying to parse the webpage for the steam marketplace using "page.asText()", but this does not work. This might happen because items aren't being loaded after the html is loaded in 1 second.

public static void main(String[] args) throws Exception{
            java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.OFF);
            java.util.logging.Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.OFF);
            String link="http://steamcommunity.com/market/search?appid=730#p6_price_asc";
            HtmlPage page;
            WebClient webClient = new WebClient(BrowserVersion.CHROME);
            page = (HtmlPage) webClient.getPage(link);
            System.out.println(page.asText());
            }

In the console I see:

Show advanced options...






 < 1 2 3 4 5 6 ... 939 >
 Showing 1-10 of 9389 results

It needs to be:

Show advanced options...
PRICE
QUANTITY
NAME
31,218
 Starting at:
 $0.35 USD
Operation Hydra Case 
 Counter-Strike: Global Offensive
 276,582
 Starting at:
 $0.23 USD
.
.
.

M4A1-S | Decimator (Field-Tested) 
 Counter-Strike: Global Offensive


 232
 Starting at:
 $27.06 USD

AWP | Asiimov (Battle-Scarred) 
 Counter-Strike: Global Offensive


 28,068
 Starting at:
 $0.75 USD

Krakow 2017 Legends Autograph Capsule 
 Counter-Strike: Global Offensive


 < 1 2 3 4 5 6 ... 940 >
 Showing 1-10 of 9392 results

Solution

  • First of all, make sure javascript is enabled.

    webClient.getOptions.setJavaScriptEnabled(true);
    

    What I typically do in order to wait for more elements to load is:

    thread.sleep(3000);
    

    This gives the page 3 seconds to load all additional content.

    You can also try any of the other methods listed by other users here:

    HTMLUnit doesn't wait for Javascript