Search code examples
jsonjsoup

JSoup Document object, how to parse data in below case of complex html element?


URL : https://www.moneycontrol.com/india/stockpricequote/auto-23-wheelers/bajajauto/BA10

I'm getting below document doc.select("#C-12-graph").get(0)

<div id="C-12-graph" style="display: none;">[{"heading":"Revenue","data":[{"year":"2018","value":"25218","formattedValue":"25,218 Cr"},{"year":"2019","value":"30357","formattedValue":"30,357 Cr"},{"year":"2020","value":"29918","formattedValue":"29,918 Cr"},{"year":"2021","value":"27741","formattedValue":"27,741 Cr"},{"year":"2022","value":"33144","formattedValue":"33,144 Cr"}]},{"heading":"Net Profit","data":[{"year":"2018","value":"3931","formattedValue":"3,931 Cr"},{"year":"2019","value":"4577","formattedValue":"4,577 Cr"},{"year":"2020","value":"4890","formattedValue":"4,890 Cr"},{"year":"2021","value":"4550","formattedValue":"4,550 Cr"},{"year":"2022","value":"5586","formattedValue":"5,586 Cr"}]},{"heading":"EPS","data":[{"year":"2018","value":"145.80","formattedValue":"145.80"},{"year":"2019","value":"170.30","formattedValue":"170.30"},{"year":"2020","value":"180.20","formattedValue":"180.20"},{"year":"2021","value":"167.90","formattedValue":"167.90"},{"year":"2022","value":"213.20","formattedValue":"213.20"}]},{"heading":"BVPS","data":[{"year":"2018","value":"705.85","formattedValue":"705.85"},{"year":"2019","value":"802.91","formattedValue":"802.91"},{"year":"2020","value":"748.59","formattedValue":"748.59"},{"year":"2021","value":"942.51","formattedValue":"942.51"},{"year":"2022","value":"1031.89","formattedValue":"1,031.89"}]},{"heading":"ROE","data":[{"year":"2018","value":"20.65","formattedValue":"20.65"},{"year":"2019","value":"21.20","formattedValue":"21.20"},{"year":"2020","value":"24.06","formattedValue":"24.06"},{"year":"2021","value":"17.80","formattedValue":"17.80"},{"year":"2022","value":"20.64","formattedValue":"20.64"}]},{"heading":"Debt to Equity","data":[{"year":"2018","value":"0.00","formattedValue":"0.00"},{"year":"2019","value":"0.00","formattedValue":"0.00"},{"year":"2020","value":"0.00","formattedValue":"0.00"},{"year":"2021","value":"0.00","formattedValue":"0.00"},{"year":"2022","value":"0.00","formattedValue":"0.00"}]}]</div>

From this I would like to get data for below heading values Revenue, Net Profit etc and below them yearly data values

e.g. heading = Revenue, 2018 = 25218, 2019 = 30357

Please clarify how to do this with Jsoup Document object ?


Solution

  • What you are asking for cannot be done using jsoup, you have to use a json parser.
    If you run the following code (with the right url) -

    Document doc = Jsoup.connect(url).get();        
    Element data = doc.select("#C-12-graph").get(0);
    String jsonArray = data.text();
    

    then the content of jsonArray is actually a json array, and jsoup treats it as plain text.
    You can use the following example, which is using the java-json library (can be found here, but there are plenty of other parsers) -

    for (int i = 0; i < jsonArray.length(); i++) {
        JSONObject jsonobject = jsonArray.getJSONObject(i);
        String name = jsonobject.getString("heading");
        System.out.println(name);
    }
    

    and the output is -

    Revenue
    Net Profit
    EPS
    BVPS
    ROE
    Debt to Equity

    The same way you can parse all the details that you need.