I am working on code that formats an XML file so that subfolder nodes are actually nested within their parent node. The source XML has every folder as a separate childnode in the root instead of the proper tree as you would expect it to be with subfolders in their main folders. The piece of code this question is about:
// Load original XML
string sFile = "PathFile";
XmlDocument doc = new XmlDocument();
doc.Load(sFile);
var n = doc.DocumentElement.SelectNodes ("//*"); // Load all nodes into nodelist n
// int nCount = n.Count; // If uncommented code works
foreach(XmlNode x in n)
{ rest of the code }
Now I have the code working properly, but only sometimes, even without changing anything in between runs. I have narrowed it down to this: When debugging the code in Visual Studio it goes wrong if I just run the code from beginning to end. If I break halfway and take a look at the attributes in the XmlNodelist n (by hovering over it with the cursor and seeing the element count) it does work. After discovering this I added the
int nCount = n.Count;
line and now the code works when running unsupervised from start to finish.
What is happening here and what is the correct way to address this issue? Note: doc.LoadXml does not work with this particular file.
Thank you loads,
Thomas
The short answer: Because of side-effects in the implementation of XmlNodeList
.
XmlNode.SelectNodes()
returns an XmlNodeList
(technically, an XPathNodeList
), which is a "live list" of the nodes matching the selection (in this case an XPath selection).
As you iterate the XPathNodeList
or access it in other ways, it makes its way through the matching nodes, building up an internal list as needed, and returning them as needed.
So if you try to rearrange the document as you are iterating through the nodes, this can foul up the iteration and cause it to stop before you have gone through all of them. The iteration is basically chasing a moving target as the document shifts underneath it.
However, in order to return a value for the Count
property, the XPathNodeList
basically needs to find every matching node and count them, so it goes through the entire set of matches and places them all in the internal list.
public override int Count {
get {
if (! done) {
ReadUntil(Int32.MaxValue);
}
return list.Count;
}
}
I think this explains what you are seeing. When you access the Count
property before making changes, it builds up the entire list of nodes, as a side-effect, so that list is still populated when you actually iterate through them.
Of course, it would not be wise to rely on this undocumented behavior.
Instead, I advise that you actually copy the contents of the XmlNodeList
to a list of your own, and then iterate over that:
string sFile = "PathFile";
XmlDocument doc = new XmlDocument();
doc.Load(sFile);
var allNodes = doc.DocumentElement
.SelectNodes("//*")
.OfType<XmlNode>() // using System.Linq;
.ToList();
foreach (XmlNode x in allNodes)
{
// rest of the code
}