How can I select all images that are not inside of a link element?
document.select("a img"); //selects all images inside a link
document.select(":not(a) img"); //images not inside a link (does not work)
Ok, so the problem here is that :not(a) img
needs just one element around the <img>
which is not an <a>
containing an <img>
. For example <body>
matches for :not(a)
. So your selector matches nearly all <img>
tags. Even if you pass an HTML string to Jsoup.parse()
which doesn't have a <body>
or <html>
tag. Jsoup automatically generates it.
Let's assume we have the following HTML:
<html>
<body>
<a><div><img id="a-div-img"></div></a>
<a><img id="a-img"></a>
<img id="img">
</body>
</html>
If you just want to exclude direct <img>
childs in <a>
you can use :not(a) > img
as selector:
Elements images = document.select(":not(a) > img");
The result will be this:
<img id="a-div-img">
<img id="img">
The problem with this is that it also prints the first <img>
of the example, which is actually inside an <a>
(#a-div-img
). If this in enough to fit your needs you can go with this solution.
Excluding all <a>
tags from the selection is not possible with a pure CSS (at least I didn't find a solution yet). But you can just remove all <a>
tags from the document before selecting all <img>
tags:
document.select("a").remove();
Elements images = document.select("img");
The result will be just this:
<img id="img">
If you need the original document without modifications you can use Document.clone()
before:
Document tempDocument = document.clone();
tempDocument.select("a").remove();
Elements images = tempDocument.select("img");
Using this the original document is never modified.