I can't find any way to extract text content only from the root element using com.gargoylesoftware.htmlunit.html. Here is some example:
<td>
W 03:10 PM-04:25 PM
<strong>
<br>
Hybrid (50%+ in-person)
</strong>
</td>
I want to extract the text content from the root element("td" in this case), but it also extract the text content from the child element, which is the part that I don't want:
private void extractTextContent(HtmlElement htmlElement) {
String content = htmlElement.getTextContent();
System.out.println(content);
}
output:
W 03:10 PM-04:25 PMHybrid (50%+ in-person)
desired output:
W 03:10 PM-04:25 PM
I've tried to use other method call "asText()", however that doesn't give me desired output. I couldn't find any people who has same question using com.gargoylesoftware.htmlunit.html. Is there any way/method that would extract text content only from the root element?
EDIT: Thank you for the answer. I used same idea of deleting child node to get my desired output. Here is the syntax for java:
private void extractTextContent(HtmlElement htmlElement) {
DomNode child = htmlElement.getLastElementChild();
String tagname = "";
if(child != null) {
tagname = child.getTextContent();
htmlElement.removeChild(tagname, 0);
}
String content = htmlElement.getTextContent();
}
You can try removing child nodes before fetching textContent.
private void extractTextContent(HtmlElement htmlElement) {
DomNode child = htmlElement.getLastElementChild();
String tagname = "";
if(child != null) {
tagname = child.getTextContent();
htmlElement.removeChild(tagname, 0);
}
String content = htmlElement.getTextContent();
}
I have edited my answer with Java Syntax provided by @XYZ