I want to retrieve books from a website but that website uses different html to show the same thing. In some pages it has a div
followed by an ul
and then the li
, like this:
<div class="book-description">
<ul>
<li>info 1</li>
<li>info 2</li>
<li>info 3</li>
</ul>
</div>
To iterate over the li
I would simply do: doc.select("div.book-description > ul > li")
On others it goes directly from div
to li
, like this:
<div class="book-description">
<li>info 1</li>
<li>info 2</li>
<li>info 3</li>
</div>
The previous syntax would not work with this page, I would need to use doc.select("div.book-description > li")
Is there a syntax I can use to specify that the ul
may be missing?
Have you tried doc.select("div.book-description li")
?
If your list have no nested lists, this selector would be ok.