Search code examples
jsoup

Jsoup returns document even when some portion of page is not fully loaded


 doc = Jsoup.connect("https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/?").timeout(10000).userAgent("Mozilla").get();

This return success even when some portion of the page https://www.valueresearchonline.com/stocks/42508/icici-bank-ltd/? is still loading

  • Enable slow 3G network on and you can see spinner for tables Valuation, Growth & Efficiency

Please clarify how to make Jsoup to wait for whole of the page is loaded and not partial data is fetched ?


Solution

  • There is nothing wrong with your jsoup code. The data that you are looking for is being fetched with an xhr request, so jsoup won't load it.
    The data can be found in this url - https://www.valueresearchonline.com/stocks/overview/42508 as a json file, which you can download and process.
    Example code with explantaions in the comments:

    String url = "https://www.valueresearchonline.com/stocks/overview/42508";
    //Must add ignoreContentType, otherwise jsoup will not fetch json
    Document doc = jsoup.connect(url).ignoreContentType(true).get();
    //Convert the text to json onject
    JSONObject json = new JSONObject(doc.text());
    //Get the two arrays that hold your data
    JSONArray valuation = json.getJSONArray("valuation_overview_table_data");
    JSONArray growth = json.getJSONArray("growth_overview_table_data");
    System.out.println(valuation);
    System.out.println(growth);
    

    In order to find the url of the data I had to search through the js files in the page, until I found it in the file script-v2__slash__stocks__slash__42621__slash__.js.