I have a bunch of questions related to whitespace handling with XmlDocument
. Please see the numbered comments in the example below.
Shouldn't all whitespace be significant in mixed mode? Why the space between the a
tags is not significant?
While I understand that the actual whitespace element is still an XmlWhitespace
, how do I normalize these spaces into XmlSignificantWhitespace
nodes? Normalize()
doesn't work.
Is my only option to do it manually?
Here's my test case:
private static void Main()
{
// 1. Shouldn't all whitespace be significant in mixed mode? Why the space between the a tags is not significant?
var doc = new XmlDocument
{
InnerXml = "<root>test1 <a>test2</a> <a>test3</a></root>",
};
PrintDoc(doc);
// 2.a. While I understand that the actual whitespace element is still XmlWhitespace, how do I normalize these spaces into XmlSignificantWhitespaces?
doc.DocumentElement.RemoveAll();
doc.DocumentElement.SetAttribute("xml:space", "preserve");
var fragment = doc.CreateDocumentFragment();
fragment.InnerXml = "test1 <a>test2</a> <a>test3</a>";
doc.DocumentElement.PrependChild(fragment);
PrintDoc(doc);
// 2.b. Normalize doesn't work
doc.Normalize();
PrintDoc(doc);
// 3.a. Manual normalization does work, is there a better way?
doc.DocumentElement.RemoveAllAttributes();
var whitespaces = doc.DocumentElement.ChildNodes.Cast<XmlNode>()
.OfType<XmlWhitespace>()
.ToList();
foreach (var whitespace in whitespaces)
{
var significant = doc.CreateSignificantWhitespace(whitespace.Value);
doc.DocumentElement.ReplaceChild(significant, whitespace);
}
PrintDoc(doc);
// 3.b. Reading from string also works
doc.InnerXml = "<root xml:space=\"preserve\">test1 <a>test2</a> <a>test3</a></root>";
PrintDoc(doc);
}
private static void PrintDoc(XmlDocument doc)
{
var nodes = doc.DocumentElement.ChildNodes.Cast<XmlNode>().ToList();
var whitespace = nodes.OfType<XmlWhitespace>().Count();
var significantWhitespace = nodes.OfType<XmlSignificantWhitespace>().Count();
Console.WriteLine($"Xml: {doc.InnerXml}\nwhitespace: {whitespace}\nsignificant whitespace: {significantWhitespace}\n");
}
The output is following:
Xml: <root>test1 <a>test2</a><a>test3</a></root>
whitespace: 0
significant whitespace: 0
Xml: <root xml:space="preserve">test1 <a>test2</a> <a>test3</a></root>
whitespace: 1
significant whitespace: 0
Xml: <root xml:space="preserve">test1 <a>test2</a> <a>test3</a></root>
whitespace: 1
significant whitespace: 0
Xml: <root>test1 <a>test2</a> <a>test3</a></root>
whitespace: 0
significant whitespace: 1
Xml: <root xml:space="preserve">test1 <a>test2</a> <a>test3</a></root>
whitespace: 0
significant whitespace: 1
Writing your own XmlNodeReader
seems to work, although it is not the "cleanest" solution.
Consider the current implementation here:
public virtual XmlNodeType MoveToContent() {
do {
switch (this.NodeType) {
case XmlNodeType.Attribute:
MoveToElement();
goto case XmlNodeType.Element;
case XmlNodeType.Element:
case XmlNodeType.EndElement:
case XmlNodeType.CDATA:
case XmlNodeType.Text:
case XmlNodeType.EntityReference:
case XmlNodeType.EndEntity:
return this.NodeType;
}
} while (Read());
return this.NodeType;
}
To get mark SignificantWhitespace
as content, you may return the NodeType
when it is XmlNodeType.SignificantWhitespace
.
Here's the complete implementation of my own WhitespaceXmlNodeReader
:
internal class WhitespaceXmlNodeReader : XmlNodeReader
{
public WhitespaceXmlNodeReader(XmlNode node)
: base(node)
{
}
public override XmlNodeType MoveToContent()
{
do
{
switch (NodeType)
{
case XmlNodeType.Attribute:
MoveToElement();
goto case XmlNodeType.Element;
case XmlNodeType.Element:
case XmlNodeType.EndElement:
case XmlNodeType.CDATA:
case XmlNodeType.Text:
case XmlNodeType.EntityReference:
case XmlNodeType.EndEntity:
// This was added:
case XmlNodeType.SignificantWhitespace:
return NodeType;
}
} while (Read());
return NodeType;
}
}