Search code examples
javahtmljsouphtml-parsing

Jsoup: get all elements before a certain element / remove all stacked elements after a certain element


This question has another version that I'm going to use to base mine Jsoup: get all elements before a certain element / remove all elements after a certain element

I want to get all .pet that are before the .friend-pets. I tried using the solution proposed in the original question but I encounter this result for this use case.

Input:

<div class="pets">
  <div>
    <div class="pet">1</div>
    <div class="pet">2</div>
  </div>
    <div class="pet">3</div>
    <div class="friends-pets">Your friends have these pets:</div>
    <div class="pet">4</div>
  <div>
     <div class="pet">5</div>
     <div class="pet">6</div>
  </div>
<div>

Expected:

<div class="pet">1</div>
 <div class="pet">2</div>
 <div class="pet">3</div>

Actual:

<div class="pet">1</div>
<div class="pet">2</div>
<div class="pet">3</div>
<div class="pet">5</div>
<div class="pet">6</div>

This happens when I run:

Element petsWrapper = document.selectFirst(".pets");
Elements pets = petsWrapper.select(".pet");
// select middle element
Element middleElement = petsWrapper.selectFirst(".friends-pets");
// remove from "pets" every element that comes after the middle element
pets.removeAll(middleElement.nextElementSiblings());
System.out.println(pets);

Because nextSiblings() method only gets elements that belong to the same parent. When I use css selectors like suggested in the 2nd answer like this:

.pet:not(.friends-pets ~ .pet)

I get this error:

Did not find balanced marker at '.friends-pets ~ .pet'

So I can't really test if it actually works.

Thank you.


Solution

  • My approach would be to select what you want and what you don't want with one selector. You can join selectors using comma , so it will work as AND operator. Order of elements will be kept and you will have one list of all elements "at the same level" without parents. Then you can get only the first half of that list.

    Elements goodElementsWithBadElement = document.select(".pet,.friends-pets");
    Element badElement = goodElementsWithBadElement.select(".friends-pets").first();
    int positionOfBadElement = goodElementsWithBadElement.indexOf(badElement);
    List<Element> onlyWhatYouWant = goodElementsWithBadElement.subList(0, positionOfBadElement);
    System.out.println(onlyWhatYouWant);
    

    btw I was the author of that previous answer ;)