I have the following:
<th>
Q4/10
<br>
<span> Nov 30, 2010 </span>
</th>
and I'd like to get Q4/10
but not the date that follows. I'm not sure how to do it within HtmlUnit. I know I can split both elements by spaces and then take everything before the first space, but I'm looking for something based on the tags themselves.
If you know that the text you want comes before any sub elements, you can just grab its first child, which will contain your text and some whitespace:
HtmlTableHeaderCell th = ...
System.err.println( th.getFirstChild().toString().trim() ) ;
The more general solution would be to loop through the children of th
looking for text nodes, and ignoring sub elements.