I am trying to parse html tags here using jsoup. I am new to jsoup. Basically I need to parse the tags and get the text inside those tags and apply the style mentioned in the class attribute.
I am creating a SpannableStringBuilder for that I can create substrings, apply styles and append them together with texts that have no styles.
String str = "There are <span class='newStyle'> two </span> workers from the <span class='oldStyle'>Front of House</span>";
SpannableStringBuilder text = new SpannableStringBuilder();
if (value.contains("</span>")) {
Document document = Jsoup.parse(value);
Elements elements = document.getElementsByTag("span");
if (elements != null) {
int i = 0;
int start = 0;
for (Element ele : elements) {
String styleName = type + "." + ele.attr("class");
text.append(ele.text());
int style = context.getResources().getIdentifier(styleName, "style", context.getPackageName());
text.setSpan(new TextAppearanceSpan(context, style), start, text.length(), Spannable.SPAN_EXCLUSIVE_EXCLUSIVE);
text.append(ele.nextSibling().toString());
start = text.length();
i++;
}
}
return text;
}
I am not sure how I can parse the strings that are not between any tags such as the "There are" and "worker from the".
Need output such as:
- There are
- <span class='newStyle'> two </span>
- workers from the
- <span class='oldStyle'>Front of House</span>
Full answer: you can get the text outside of the tags by getting childNodes()
. This way you obtain List<Node>
. Note I'm selecting body
because your HTML fragment doesn't have any parent element and parsing HTML fragment with jsoup adds <html>
and <body>
automatically.
If Node
contains only text it's of type TextNode
and you can get the content using toString()
.
Otherwise you can cast it to Element
and get the text usingelement.text()
.
String str = "There are <span class='newStyle'> two </span> workers from the <span class='oldStyle'>Front of House</span>";
Document doc = Jsoup.parse(str);
Element body = doc.selectFirst("body");
List<Node> childNodes = body.childNodes();
for (int i = 0; i < childNodes.size(); i++) {
Node node = body.childNodes().get(i);
if (node instanceof TextNode) {
System.out.println(i + " -> " + node.toString());
} else {
Element element = (Element) node;
System.out.println(i + " -> " + element.text());
}
}
output:
0 ->
There are
1 -> two
2 -> workers from the
3 -> Front of House
By the way: I don't know how to get rid of the first line break before There are
.