Search code examples
javaregexjsoup

Using JSoup to parse text between two different tags


I have the following HTML...

<h3 class="number">
<span class="navigation">
6:55 <a href="/results/result.html" class="under"><b>&raquo;</b></a>
</span>**This is the text I need to parse!**</h3>

I can use the following code to extract the text from h3 tag.

Element h3 = doc.select("h3").get(0);

Unfortunately, that gives me everything in that tag.

6:55 &raquo; This is the text I need to parse!

Can I use Jsoup to parse between different tags? Is there a best practice for doing this (regex?)


Solution

  • (regex?)

    No, as you can read in the answers of this question, you can't parse HTML using a regular expression.

    Try this:

    Element h3 = doc.select("h3").get(0);
    String h3Text = h3.text();
    String spanText = h3.select("span").get(0).text();
    String textBetweenSpanEndAndH3End = h3Text.replace(spanText, "");