I have a user-submitted string that contains HTML content such as
"<p></p><div></div><p>Hello<br/>world</p><p></p>"
I would like to transform this string such that empty tag pairs are removed (but empty tags like <br/>
are retained). For example, the result of this transformation should convert the string above to
"<p>Hello<br/>world</p>"
I'd like to use JSoup to do this, as I already have this on my classpath, and it would be easiest for me to perform this transformation on the server-side.
Here is an example that does just that (using JSoup):
String html = "<p></p><div></div><p>Hello<br/>world</p><p></p>";
Document doc = Jsoup.parse(html);
for (Element element : doc.select("*")) {
if (!element.hasText() && element.isBlock()) {
element.remove();
}
}
System.out.println(doc.body().html())
The output of the code above is what you are looking for:
<p>Hello<br />world</p>