Search code examples
htmlparsingjsoup

Jsoup get Product Description


I would like to get the Product Description of this Website: https://www.real.de/product/346010948/?id_item_promotion=620332

I think I need But I have no clue how to parse it.

This is my parsing code

@Override
public Product getDescriptionByReal(Product product) {
    String completeUrl = "https://www.real.de/product/" + product.getPlattformProductId() + "/";
    try {
        Document document = Jsoup.connect(completeUrl).get();
        Elements description = document.select("div#prodct-data");
        product.setDesc(description.text());
        return product;

    } catch (IOException e) {
        product.setDesc(e.getMessage());
        return product;
    }
}

If i try "document.select("div.rd-product-description__text");" or "document.select("div#prodct-data")" i get nothing, if i change it up to "document.select("div"); i get not the data i want.


Solution

  • The product description is loaded asynchronously after the main document. Jsoup can access ony the document before Javascript modifications. Using chrome debugger I found the URL from where it's fetched. You can download this JSON:
    https://www.real.de/pdp-test/api/v1/346010948/product-description/
    and parse it to get the description. Jsoup can't parse JSON so you'll have to use other library or use simple regular expression to get the part between <div> and </div>.