Search code examples
javahtmlcssjsoup

allowing for missing parent in jsoup selector


I want to retrieve books from a website but that website uses different html to show the same thing. In some pages it has a div followed by an ul and then the li, like this:

<div class="book-description">
   <ul>
      <li>info 1</li>
      <li>info 2</li>
      <li>info 3</li>
   </ul>
</div>

To iterate over the li I would simply do: doc.select("div.book-description > ul > li")

On others it goes directly from div to li, like this:

<div class="book-description">
   <li>info 1</li>
   <li>info 2</li>
   <li>info 3</li>
</div>

The previous syntax would not work with this page, I would need to use doc.select("div.book-description > li") Is there a syntax I can use to specify that the ul may be missing?


Solution

  • Have you tried doc.select("div.book-description li") ?

    If your list have no nested lists, this selector would be ok.