Search code examples
javahtmljsoupjava-11

Iterate <div> inside <ul> tag Java - Jsoup


I'm trying to get all <div> inside a <ul> tag using jsoup.

This is the HTML

<html>
   <head>
      <title>Try jsoup</title>
   </head>
   <body>
      <ul class="product__listing product__grid">
         <div class="product-item">
            <div class="content-thumb_gridpage">
               <a class="thumb" href="index1.html" title="Tittle 1">
            </div>
         </div>
         <div class="product-item">
            <div class="content-thumb_gridpage">
               <a class="thumb" href="index2.html" title="Tittle 2">
            </div>
         </div>
         <div class="product-item">
            <div class="content-thumb_gridpage">
               <a class="thumb" href="index3.html" title="Tittle 3">
            </div>
         </div>
      </ul>
   </body>
</html>

What I'm trying to iterate is all <div class="product-item"> so then I can add to a list all <a class="thumb"> properties

List-product-details
[0] href="index1.html" title="Tittle 1"
[1] href="index2.html" title="Tittle 2"
[2] href="index3.html" title="Tittle 3"

Note that there can be 'N' product-item div

Here is What I got so far:

Elements productList = sneakerList.select("ul.product__listing product__grid");
    Elements product = productList.select("ul.product-item");
    
    for (int i = 0; i < product.size(); i++) {
        Elements productInfo = product.get(i).select("div.product-item").select("div.content-thumb_gridpage").select("a.thumb");
        System.out.format("%s %s %s\n", productInfo.attr("title"), productInfo.attr("href"), productInfo.text());     
    }

Solution

  • Did you try debugging line by line and checking at which line your code doesn't do what you expect? I see two mistakes.

    1. The first selector "ul.product__listing product__grid" contains a space. Now it means: find element ul with class product__listing and inside search for element <product__grid> </product__grid>. You probably meant: select element ul having class product__listing and having class product__grid. You have to use dot . before second class name and remove space to look at the same level. So correct selector will be: "ul.product__listing.product__grid".
    2. Second selector you're using is "ul.product-item". It will return empty result. That's because you're already inside ul and you're searching for another ul. Selector should be relative to where you are so using only ".product-item" will be enough.

    And now I get the ouput:

    Tittle 1 index1.html
    Tittle 2 index2.html 
    Tittle 3 index3.html