Search code examples
c#enumerateenumerablexml-attributexmllist

Why does my C# Xml code only work when I enumerate variable enumerable


I am working on code that formats an XML file so that subfolder nodes are actually nested within their parent node. The source XML has every folder as a separate childnode in the root instead of the proper tree as you would expect it to be with subfolders in their main folders. The piece of code this question is about:

// Load original XML

string sFile = "PathFile";
XmlDocument doc = new XmlDocument();
doc.Load(sFile);

var n = doc.DocumentElement.SelectNodes ("//*");   // Load all nodes into nodelist n
// int nCount = n.Count;                           // If uncommented code works

foreach(XmlNode x in n)
{ rest of the code }

Now I have the code working properly, but only sometimes, even without changing anything in between runs. I have narrowed it down to this: When debugging the code in Visual Studio it goes wrong if I just run the code from beginning to end. If I break halfway and take a look at the attributes in the XmlNodelist n (by hovering over it with the cursor and seeing the element count) it does work. After discovering this I added the

int nCount = n.Count; 

line and now the code works when running unsupervised from start to finish.

What is happening here and what is the correct way to address this issue? Note: doc.LoadXml does not work with this particular file.

Thank you loads,

Thomas


Solution

  • The short answer: Because of side-effects in the implementation of XmlNodeList.

    XmlNode.SelectNodes() returns an XmlNodeList (technically, an XPathNodeList), which is a "live list" of the nodes matching the selection (in this case an XPath selection).

    As you iterate the XPathNodeList or access it in other ways, it makes its way through the matching nodes, building up an internal list as needed, and returning them as needed.

    So if you try to rearrange the document as you are iterating through the nodes, this can foul up the iteration and cause it to stop before you have gone through all of them. The iteration is basically chasing a moving target as the document shifts underneath it.

    However, in order to return a value for the Count property, the XPathNodeList basically needs to find every matching node and count them, so it goes through the entire set of matches and places them all in the internal list.

    public override int Count {
        get {
            if (! done) {
                ReadUntil(Int32.MaxValue);
            }
            return list.Count;
        }
    }
    

    I think this explains what you are seeing. When you access the Count property before making changes, it builds up the entire list of nodes, as a side-effect, so that list is still populated when you actually iterate through them.

    Of course, it would not be wise to rely on this undocumented behavior.

    Instead, I advise that you actually copy the contents of the XmlNodeList to a list of your own, and then iterate over that:

    string sFile = "PathFile";
    XmlDocument doc = new XmlDocument();
    doc.Load(sFile);
    
    var allNodes = doc.DocumentElement
        .SelectNodes("//*")
        .OfType<XmlNode>()      // using System.Linq;
        .ToList();
    
    foreach (XmlNode x in allNodes)
    {
        // rest of the code 
    }