When I run the following code:
import java.io.IOException;
import java.io.Reader;
import java.io.StringReader;
import javax.swing.text.BadLocationException;
import javax.swing.text.EditorKit;
import javax.swing.text.Element;
import javax.swing.text.html.HTMLDocument;
import javax.swing.text.html.HTMLEditorKit;
.
.
.
String content = "x";
String html = "<html><body><dyn/>" + content + "<dyn/></body></html>";
final Reader reader = new StringReader(html);
final EditorKit editorKit = new HTMLEditorKit();
HTMLDocument hTMLDocument = new HTMLDocument();
editorKit.read(reader, hTMLDocument, 0);
Element defaultRootElement = hTMLDocument.getDefaultRootElement();
Element branchElement = defaultRootElement.getElement(1).getElement(0);
for (int i = 0; i < branchElement.getElementCount(); i++) {
Element element = branchElement.getElement(i);
System.out.print(element);
}
I get the following output:
LeafElement(dyn) 1,2
LeafElement(content) 2,3
LeafElement(dyn) 3,4
LeafElement(content) 4,5
However, if I change the value of content
to " "
:
String content = " ";
I get this output:
LeafElement(dyn) 1,2
LeafElement(dyn) 2,3
LeafElement(content) 3,4
Why is a content LeafElement
constructed for "x"
, but not for " "
? I want a LeafElement
to be constructed for " "
. Am I doing something wrong or is this a problem with HTMLDocument
or HTMLEditorKit
?
This is just the product of whitespace collapse in HTML. Since that space you're inserting is the only thing between the two <dyn/>
tags, it gets ignored by the parser, thus not being represented by a LeafElement.
As camickr mentioned, you would have to use non-breaking space entities to preserve all whitespaces. But, since you have no control over the HTML, your best bet is to customise HTMLEditorKit's parser. Perhaps the following resources may come in useful:
Hope this helps!