Search code examples
c#xmlxmlnodexmlnodelist

Why does XmlNodeList.Count return inconsistent results after modifying the XML document?


I have an XML document, and I am trying to separate the nodes from each other. I want the root node alone, then the second node alone, and then a list of the nodes that exists within the second nodes. I have run into the problem where when I remove nodes from the second node or main node, my list becomes empty. I don't understand why this happens, especially because of this weird behavior below.

class Program
{
    static void Main(string[] args)
    {
        XmlDocument doc = new XmlDocument();

        doc.Load(@"C:\Users\Username\Desktop\Diagram.xml");

        XmlNode rootNode = doc.DocumentElement;
        XmlNode secondNode = doc.SelectSingleNode(rootNode.Name + "/root");

        XmlNodeList nodelist = doc.SelectNodes("//root/mxCell");

        Console.WriteLine("-----------------------------------------");
        Console.WriteLine(RemoveChildren(rootNode).OuterXml);
        Console.WriteLine("-----------------------------------------");
        Console.WriteLine(RemoveChildren(secondNode).OuterXml);
        Console.WriteLine("-----------------------------------------");
        //Console.WriteLine(rootNode.OuterXml);
        Console.WriteLine(nodelist.Count); //Becomes 0
        if (nodelist != null && nodelist.Count > 0)
        {
            foreach (XmlNode n in nodelist)
            {
                Console.WriteLine(n.OuterXml);
            }
        }

        Console.ReadLine();
    }

    private static XmlNode RemoveChildren(XmlNode n) {

        while (n.FirstChild != null)
        {
            n.RemoveChild(n.FirstChild);
        }

        return n;
    }
}

If I run this code, my nodelist.count is going to become 0. Why does the nodelist become 0, but why can I still access the second node?

enter image description here

However, if I add the foreach loop right after doc.SelectNodes("//root/mxCell"); the count is going to become 4.

like this,

class Program
{
    static void Main(string[] args)
    {
        XmlDocument doc = new XmlDocument();

        doc.Load(@"C:\Users\Username\Desktop\Diagram.xml");

        XmlNode rootNode = doc.DocumentElement;
        XmlNode secondNode = doc.SelectSingleNode(rootNode.Name + "/root");

        XmlNodeList nodelist = doc.SelectNodes("//root/mxCell");

        // Added code here
        if (nodelist != null && nodelist.Count > 0)
        {
            foreach (XmlNode n in nodelist)
            {
                Console.WriteLine(n.OuterXml);
            }
        }
        // End of added code

        Console.WriteLine("-----------------------------------------");
        Console.WriteLine(RemoveChildren(rootNode).OuterXml);
        Console.WriteLine("-----------------------------------------");
        Console.WriteLine(RemoveChildren(secondNode).OuterXml);
        Console.WriteLine("-----------------------------------------");
        //Console.WriteLine(rootNode.OuterXml);
        Console.WriteLine(nodelist.Count); //Becomes 4
        if (nodelist != null && nodelist.Count > 0)
        {
            foreach (XmlNode n in nodelist)
            {
                Console.WriteLine(n.OuterXml);
            }
        }

        Console.ReadLine();
    }

    private static XmlNode RemoveChildren(XmlNode n) {

        while (n.FirstChild != null)
        {
            n.RemoveChild(n.FirstChild);
        }

        return n;
    }
}

The count is now 4.

enter image description here

Here is the xml used:

<mxGraphModel dx="1086" dy="596" grid="1" gridSize="10" guides="1" tooltips="1" connect="1" arrows="1" fold="1" page="1" pageScale="1" pageWidth="850" pageHeight="1100" math="0" shadow="0">
  <root>
    <mxCell id="0"/>
    <mxCell id="1" parent="0"/>
    <mxCell id="YJb7HCrh72y2aGPrfETQ-1" value="" style="endArrow=classic;html=1;" parent="1" edge="1">
      <mxGeometry width="50" height="50" relative="1" as="geometry">
        <mxPoint x="130" y="310" as="sourcePoint"/>
        <mxPoint x="180" y="260" as="targetPoint"/>
      </mxGeometry>
    </mxCell>
    <mxCell id="YJb7HCrh72y2aGPrfETQ-2" value="" style="endArrow=classic;html=1;" parent="1" edge="1">
      <mxGeometry width="50" height="50" relative="1" as="geometry">
        <mxPoint x="290" y="270" as="sourcePoint"/>
        <mxPoint x="340" y="220" as="targetPoint"/>
      </mxGeometry>
    </mxCell>
  </root>
</mxGraphModel>

Solution

  • The behavior you are observing can be demonstrated more simply as follows. The following unit test will succeed (demo fiddle here):

    XmlNodeList nodelist = doc.SelectNodes("*/*"); // Select all children of the root node
    RemoveChildren(doc.DocumentElement);
    Assert.IsTrue(nodelist.Count == 0); // Passes
    

    While the following will fail:

    XmlNodeList nodelist = doc.SelectNodes("*/*"); // Select all children of the root node
    Assert.IsTrue(nodelist.Count > 0);  // Passes
    RemoveChildren(doc.DocumentElement);
    Assert.IsTrue(nodelist.Count == 0); // FAILS!?
    

    Why might the addition of a mere nodelist.Count before removing some nodes cause an inconsistency in the contents of nodelist after the nodes are removed?

    As it turns out, there is no specified behavior for the XmlNodeList returned by SelectNodes() in this situation. From the documentation remarks for XmlNode.SelectNodes():

    The XmlNodeList object returned by this method will be valid while the underlying document remains unchanged. If the underlying document changes, unexpected results may be returned (no exception will be thrown).

    Such "unexpected results" are what you are observing. Once RemoveChildren() is called the contents of nodelist are not specified or guaranteed by .Net. (In fact, it appears that the XmlNodeList returned uses a lazy evaluation mechanism. As soon as the node list is counted or iterated through (but not before), the XPath query is evaluated once and only once and the results cached and subsequently reused.)

    This documented restriction on node lists returned by XPath queries is in contrast to the general documentation for XmlNodeList which states:

    Changes to the children of the node object that the XmlNodeList collection was created from are immediately reflected in the nodes returned by the XmlNodeList properties and methods.

    That remark appears to apply only to XmlNodeList child lists returned by methods of XmlNode DOM objects.

    To avoid the inconsistency, you can explicitly materialize the XmlNodeList to a List<XmlNode> immediately after evaluation like so:

    var nodelist = doc.SelectNodes("*/*").Cast<XmlNode>().ToList();
    

    Demo fiddle here.

    Switching to LINQ to XML would be another option, as it's generally easier to work with than the older XmlDocument API.