I have an XML document, and I am trying to separate the nodes from each other. I want the root node alone, then the second node alone, and then a list of the nodes that exists within the second nodes. I have run into the problem where when I remove nodes from the second node or main node, my list becomes empty. I don't understand why this happens, especially because of this weird behavior below.
class Program
{
static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load(@"C:\Users\Username\Desktop\Diagram.xml");
XmlNode rootNode = doc.DocumentElement;
XmlNode secondNode = doc.SelectSingleNode(rootNode.Name + "/root");
XmlNodeList nodelist = doc.SelectNodes("//root/mxCell");
Console.WriteLine("-----------------------------------------");
Console.WriteLine(RemoveChildren(rootNode).OuterXml);
Console.WriteLine("-----------------------------------------");
Console.WriteLine(RemoveChildren(secondNode).OuterXml);
Console.WriteLine("-----------------------------------------");
//Console.WriteLine(rootNode.OuterXml);
Console.WriteLine(nodelist.Count); //Becomes 0
if (nodelist != null && nodelist.Count > 0)
{
foreach (XmlNode n in nodelist)
{
Console.WriteLine(n.OuterXml);
}
}
Console.ReadLine();
}
private static XmlNode RemoveChildren(XmlNode n) {
while (n.FirstChild != null)
{
n.RemoveChild(n.FirstChild);
}
return n;
}
}
If I run this code, my nodelist.count is going to become 0. Why does the nodelist become 0, but why can I still access the second node?
However, if I add the foreach loop right after doc.SelectNodes("//root/mxCell"); the count is going to become 4.
like this,
class Program
{
static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load(@"C:\Users\Username\Desktop\Diagram.xml");
XmlNode rootNode = doc.DocumentElement;
XmlNode secondNode = doc.SelectSingleNode(rootNode.Name + "/root");
XmlNodeList nodelist = doc.SelectNodes("//root/mxCell");
// Added code here
if (nodelist != null && nodelist.Count > 0)
{
foreach (XmlNode n in nodelist)
{
Console.WriteLine(n.OuterXml);
}
}
// End of added code
Console.WriteLine("-----------------------------------------");
Console.WriteLine(RemoveChildren(rootNode).OuterXml);
Console.WriteLine("-----------------------------------------");
Console.WriteLine(RemoveChildren(secondNode).OuterXml);
Console.WriteLine("-----------------------------------------");
//Console.WriteLine(rootNode.OuterXml);
Console.WriteLine(nodelist.Count); //Becomes 4
if (nodelist != null && nodelist.Count > 0)
{
foreach (XmlNode n in nodelist)
{
Console.WriteLine(n.OuterXml);
}
}
Console.ReadLine();
}
private static XmlNode RemoveChildren(XmlNode n) {
while (n.FirstChild != null)
{
n.RemoveChild(n.FirstChild);
}
return n;
}
}
The count is now 4.
Here is the xml used:
<mxGraphModel dx="1086" dy="596" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
<root>
<mxCell id="0"/>
<mxCell id="1" parent="0"/>
<mxCell id="YJb7HCrh72y2aGPrfETQ-1" value="" style="endArrow=classic;html=1;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="130" y="310" as="sourcePoint"/>
<mxPoint x="180" y="260" as="targetPoint"/>
</mxGeometry>
</mxCell>
<mxCell id="YJb7HCrh72y2aGPrfETQ-2" value="" style="endArrow=classic;html=1;" parent="1" edge="1">
<mxGeometry width="50" height="50" relative="1" as="geometry">
<mxPoint x="290" y="270" as="sourcePoint"/>
<mxPoint x="340" y="220" as="targetPoint"/>
</mxGeometry>
</mxCell>
</root>
</mxGraphModel>
The behavior you are observing can be demonstrated more simply as follows. The following unit test will succeed (demo fiddle here):
XmlNodeList nodelist = doc.SelectNodes("*/*"); // Select all children of the root node
RemoveChildren(doc.DocumentElement);
Assert.IsTrue(nodelist.Count == 0); // Passes
While the following will fail:
XmlNodeList nodelist = doc.SelectNodes("*/*"); // Select all children of the root node
Assert.IsTrue(nodelist.Count > 0); // Passes
RemoveChildren(doc.DocumentElement);
Assert.IsTrue(nodelist.Count == 0); // FAILS!?
Why might the addition of a mere nodelist.Count
before removing some nodes cause an inconsistency in the contents of nodelist
after the nodes are removed?
As it turns out, there is no specified behavior for the XmlNodeList
returned by SelectNodes()
in this situation. From the documentation remarks for XmlNode.SelectNodes()
:
The
XmlNodeList
object returned by this method will be valid while the underlying document remains unchanged. If the underlying document changes, unexpected results may be returned (no exception will be thrown).
Such "unexpected results" are what you are observing. Once RemoveChildren()
is called the contents of nodelist
are not specified or guaranteed by .Net. (In fact, it appears that the XmlNodeList
returned uses a lazy evaluation mechanism. As soon as the node list is counted or iterated through (but not before), the XPath query is evaluated once and only once and the results cached and subsequently reused.)
This documented restriction on node lists returned by XPath queries is in contrast to the general documentation for XmlNodeList
which states:
Changes to the children of the node object that the
XmlNodeList
collection was created from are immediately reflected in the nodes returned by theXmlNodeList
properties and methods.
That remark appears to apply only to XmlNodeList
child lists returned by methods of XmlNode
DOM objects.
To avoid the inconsistency, you can explicitly materialize the XmlNodeList
to a List<XmlNode>
immediately after evaluation like so:
var nodelist = doc.SelectNodes("*/*").Cast<XmlNode>().ToList();
Demo fiddle here.
Switching to LINQ to XML would be another option, as it's generally easier to work with than the older XmlDocument
API.