Search code examples
htmlparsingjsoupparentchildren

jsoup: How to select the parent nodes, which have children satisfying a condition


Here's the part of the HTML (simplified for the question):

<a href="/auctions?id=4672" class="auction sec"> 
 <div class="progress"> 
  <div class="guarantee"> 
   <img src="/img/ico/2.png" /> 
  </div> 
 </div> </a>
<a href="/auctions?id=4670" class="auction">  
 <div class="progress"> 
  <div class="guarantee"> 
   <img src="/img/ico/1.png" /> 
  </div> 
 </div> </a>

What I want to get is the vector containing the ids of the auctions, for which the 2.png image is displayed (id=4672 in this case). How to construct the Selector query in order to obtain this?

http://jsoup.org/apidocs/org/jsoup/select/Selector.html - Here I can only find how to select the children, not the parents...

Any help appreciated, including the usage of other libraries. I've tried Jsoup because it seemed to be the most popular.


Solution

  • You can use parent() method:

    final String html = "<a href=\"/auctions?id=4672\" class=\"auction sec\"> \n"
            + " <div class=\"progress\"> \n"
            + "  <div class=\"guarantee\"> \n"
            + "   <img src=\"/img/ico/2.png\" /> \n"
            + "  </div> \n"
            + " </div> </a>\n"
            + "<a href=\"/auctions?id=4670\" class=\"auction\">  \n"
            + " <div class=\"progress\"> \n"
            + "  <div class=\"guarantee\"> \n"
            + "   <img src=\"/img/ico/1.png\" /> \n"
            + "  </div> \n"
            + " </div> </a>";
    
    Document doc = Jsoup.parse(html);
    
    for( Element element : doc.select("img") ) // Select all 'img' tags
    {
        Element divGuarantee = element.parent(); // Get parent element of 'img'
        Element divProgress = divGuarantee.parent(); // Get parent of parent etc.
    
        // ...
    }