Search code examples
javaparsingjsoup

Jsoup parse HTML including span tags


I have some HTML with the following format:

<article class="cik" id="100">
  <a class="ci" href="/abc/1001/STUFF">
    <img alt="Micky Mouse" src="/images/1001.jpg" />
    <span class="mick vtEnabled"></span>
  </a>

  <div>
    <a href="/abc/1001/STUFF">Micky Mouse</a>
    <span class="FP">$88.00</span>&nbsp;&nbsp;<span class="SP">$49.90</span>
  </div>
</article>

In the above code the <a> tag inside article has a span class="mick vtEnabled" with no label. I want to check if this span tag with the class name specified is present within the article tag.

How do I do that? I tried to select("> a[href] > span.mick vtEnabled") and checked the size. It remains 0 for all the article tags irrespective of whether it's set or not. Any inputs?


Solution

  • Starting from individual article tags would be good:

    final String test = "<article class=\"cik\" id=\"100\"><a class=\"ci\" href=\"/abc/1001/STUFF\"><img alt=\"Micky Mouse\" src=\"/images/1001.jpg\" /></a><div><a href=\"/abc/1001/STUFF\">Micky Mouse</a><span class=\"FP\">$88.00</span>&nbsp;&nbsp;<span class=\"SP\">$49.90</span></div></article>";
    final Elements articles = Jsoup.parse(test).select("article");
    for (final Element article : articles) {
        final Elements articleImages = article.select("> a[href] > img[src]");
        for (final Element image : articleImages) {
            System.out.println(image.attr("src"));
        }
        final Elements articleLinks = article.select("> div > a[href]");
        for (final Element link : articleLinks) {
            System.out.println(link.attr("href"));
            System.out.println(link.text());
        }
        final Elements articleFPSpans = article.select("> div > span.FP");
        for (final Element span : articleFPSpans) {
            System.out.println(span.text());
        }
    }
        final Elements articleSPSpans = article.select("> div > span.SP");
        for (final Element span : articleSPSpans) {
            System.out.println(span.text());
        }
    }
    

    This prints:

    /images/1001.jpg
    /abc/1001/STUFF
    Micky Mouse
    $88.00
    $49.90