I want to read <h3>
and text between <h3>
so I want create a json model like title: text,text,text for h3 and text without ad.
{
"title": "text,text,text",
"title": "text",
"title": "text",
...
}
How can I do it in this case with Java or Kotlin?
<div class="biri" id="biri">
<h1>Yoksa Birisi mi itti?</h1>
<h3>Title</h3>Text,
<br>Text,
<br>Text.
<h3>Title:</h3>Text
<h3>Title:</h3>Text
<div class="ad">
<div style="max-width:336px;">
<script async src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<ins class="adsbygoogle" style="display:block" data-ad-client="ca-pub-7180771993103993" data-ad-slot="2897611612" data-ad-format="auto"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
</div>
</div>
<h3>Title</h3>Text:
<b>Text:</b> (Text
<br>
</div>
You can get all h3
tags by using Document.select()
:
Document doc = Jsoup.parse(html);
List<String> h3s = doc.select("h3").stream()
.map(Element::text)
.collect(Collectors.toList());
This extracts the content of all h3
tags and collects the content of them. The result is this:
[Title, Title:, Title:, Title]
Beside that the JSON, you want to create is not valid, because the keys in an JSON object have to be unique, so you can not have multiple h3
keys.