I have problem with fetch site with car ads. I would like to get advertiser's name from it. The main problem is that sometimes that name is showing in different way.
1) Name is Kajetan
<div class="seller-box__seller-info">
<small class="seller-box__seller-registration">Sprzedający na OTOMOTO od 2015</small>
<small class="seller-box__seller-type">Osoba prywatna</small>
<h2 class="seller-box__seller-name"> Kajetan </h2>
</div>
2) Name is AS MOTORS Centrum Pojazdów Używanych KIA
<div class="seller-box__seller-info">
<small class="seller-box__seller-registration">Sprzedający na OTOMOTO od 2019</small>
<small class="seller-box__seller-type">Dealer</small>
<h2 class="seller-box__seller-name">
<div class="seller-badge"> <img src="xx.jpg" data-toggle="tooltip" data-placement="bottom" title="" data-original-title="Ten dealer korzysta z pakietu usług Premium Plus" class="">
</div>
<a href="https://asmotorsuzywane.otomoto.pl" title="AS MOTORS Centrum Pojazdów Używanych KIA">AS MOTORS Centrum Pojazdów Używanych KIA</a>
</h2>
</div>
In the first case the solution is easy because I'll do it like this:
public static String fetchOwnerName (String html) {
Elements ownerElement = Jsoup.parse(html).getElementsByClass("seller-box__seller-info").select("h2");
String owner = StringUtils.substringBetween(String.valueOf(ownerElement), "\">", "</h2>");
return owner;
}
But in the second case the problem is that after <h2>
there are additional <div>
and what is more, name of the advertiser is between <a href=""
.
How should I change fetchOwnerName method to be universal? I'm using JSOUP library to parse HTML page. Thanks for all of your suggestions.
You can get text inside the h2 tags without worrying about the additional tags i.e
div
a
You just have to call .text()
Elements ownerElement = Jsoup.parse(html).getElementsByClass("seller-box__seller-info").select("h2");
String owner = ownerElement.text();
This will work if no other text except advertiser's name is present between h2
tags