Search code examples
javaandroidjsoup

Jsoup elements don't show up when calling the .text() method


I'm making an android app and I wanna web scrape a page with motorcycles. When I had iterated over the elements of that page they got printed with html tags but since I had put .text() method, I got everything printed in one line on my terminal. You can check my code down below for a better understanding. Thanks in advance.

@Override
protected String doInBackground(Void... voids) {
    String title = "";
    try {
        Document document = Jsoup.connect("https://www.hotcars.com/best-motorcycles-for-beginners/").get();
        Elements elements = document.select("div[class=w-website]").select("div[class=w-content]");

        for (Element element : elements.select("section[class=article-body]")) {
            title = element.select("h2").text();
            System.out.print(title);
        }
    } catch (IOException e) {
        e.printStackTrace();
    }
    return null;
}

If I delete .text() from the title, then I get my text but with html tags which I don't need.


Solution

  • Try the following as a starting point to get the desired elements

    try {
        Document document = Jsoup.connect("https://www.hotcars.com/best-motorcycles-for-beginners/").get();
        Elements h2s = document.select("section[class=article-body] h2");
    
        for (Element h2 : h2s) {
            String title = h2.text();
            Element img = h2.nextElementSibling().selectFirst("picture").selectFirst("source");
            String imgSrc = img.attr("data-srcset");
            Element p1 = h2.nextElementSibling().nextElementSibling();
            Element p2 = p1.nextElementSibling();
            String discription = p1.wholeText() + System.lineSeparator() + p2.wholeText();
            System.out.println(title);
            System.out.println();
            System.out.println(imgSrc);
            System.out.println();
            System.out.println(discription);
            System.out.println("------------------------");
        }
    } catch (IOException e) {
        e.printStackTrace();
    }