I have the following:
</div>
<p>
<a href="https://urlIwant.com" data-wpel-link="internal">
<span class="image-holder" style="padding-bottom:149.92679355783%;">
<img loading="lazy" src="https://urlIwant.com" width="683" height="1024" class="alignnone size-full wp-image-200816" />
</span>
</a>
</p>
<p>
<span id="more-20000"></span>
</p>
<p>
<a href="https://urlIwant.com" data-wpel-link="internal">
<span class="image-holder" style="padding-bottom:149.92679355783%;">
<img loading="lazy" src="https://urlIwant.com" width="683" height="1024" class="alignnone size-full wp-image-200833" />
</span>
</a>
</p>
<p>
<a href="https://urlIwant.com" data-wpel-link="internal">
<span class="image-holder" style="padding-bottom:145.71428571429%;">
<img loading="lazy" src="https://urlIwant.com" width="700" height="1020" class="alignnone size-medium wp-image-200834" sizes="(max-width: 700px) 100vw, 700px" />
</span>
</a>
</p>
<p>
<a href="https://urlIwant.com" data-wpel-link="internal">
<span class="image-holder" style="padding-bottom:143.42857142857%;">
<img loading="lazy" src="https://urlIwant.com" width="700" height="1004" class="alignnone size-medium wp-image-200835" 836w" sizes="(max-width: 700px) 100vw, 700px" />
</span>
</a>
</p>
</div>
How can I extract all of the urls that contain the paragraph
tag, href
and contains the class
"image-holder"
?
I can't figure out how to add the span class
try {
Document doc = Jsoup.connect("https://urltoextractfrom.com").get();
Elements selections = doc.select("p a[href]");
for (Element e : selections) {
System.out.println(e);
}
} catch (Exception e) {
e.printStackTrace();
}
If I have understood what you want to extract correctly, you can use this selector:
p a:has(span.image-holder)
That finds all the a
elements which descend from a p
element, and which contain a span
with class image-holder
set.
So in code:
Document document = Jsoup.parse(html);
Elements links = document.select("p a:has(span.image-holder)");
List<String> urls = links.eachAttr("href");
You can use the try.jsoup REPL to quickly iterate on selectors. https://try.jsoup.org/~wvd2VHaJtnr10qEiLS9g_-E6UA8
(If there's content this selects that you don't want to, you can clarify that in your question with examples.)