Search code examples
c#xml

xmlNode.SelectSingleNode always returns same value even though the node changes


I am reading in a bunch of XML files, transforming them and loading the data in to another system.

Previously I had done this using ThreadPool, however the provider of the files and therefore the structure has changed, so I'm now trying Aysync-Await and getting an odd result.

As I process the files I get a list of the xmlNodes and loop over them

foreach (XmlNode currentVenue in venueNodes)
{
      Console.WriteLine(currentVenue.OuterXml);
      Console.WriteLine(currentVenue.SelectSingleNode(@"//venueName").InnerText);
}

however the second WriteLine always returns the result expected for the first node, example:

<venue venueID="xartrix" lastModified="2012-08-20 10:49:30"><venueName>Artrix</venueName></venue>
Artrix
<venue venueID="xbarins" lastModified="2013-04-29 11:39:07"><venueName>The Barber Institute Of Fine Arts, University Of Birmingham</venueName></venue>
Artrix
<venue venueID="xbirmus" lastModified="2012-11-13 16:41:13"><venueName>Birmingham Museum &amp; Art Gallery</venueName></venue>
Artrix

here is the complete code:

public async Task ProcessFiles()
{
    string[] filesToProcess = Directory.GetFiles(_filePath);
    List<Task> tasks = new List<Task>();

    foreach (string currentFile in filesToProcess)
    {
        tasks.Add(Task.Run(()=>processFile(currentFile)));
    }

    await Task.WhenAll(tasks);

}

private async Task processFile(string currentFile)
{
    try
    {
         XmlDocument currentXmlFile = new XmlDocument();
         currentXmlFile.Load(currentFile);

         //select nodes for processing
         XmlNodeList venueNodes = currentXmlFile.SelectNodes(@"//venue");

         foreach (XmlNode currentVenue in venueNodes)
         {
              Console.WriteLine(currentVenue.InnerXml);
              Console.WriteLine(currentVenue.SelectSingleNode(@"//venueName").InnerText);                 
         }
     }
     catch (Exception e)
     {
         Console.WriteLine(e.Message);
     }
 }

Obviously I've missed something, but I cannot see what, can someone point it out please?


Solution

  • SelectSingleNode returns only a single node in document order from the document. @jbl is correct, //venueName starts from the document root. The // xpath operator is the "descendent selector" operator.

    I work with XML and XPath often and this is a common mistake. You need to make sure that your context node is correct when calling SelectSingleNode. So, like we just all said, using //venueName gets the first <venueName /> node in document order starting from the root of the document.

    In order to get the <venueName /> node that is a child of the current node you're iterating over, you need to use the following code:

    foreach (XmlNode currentVenue in venueNodes)
    {
        Console.WriteLine(currentVenue.OuterXml);
        // The '.' in the XPath expression in the call to 
        // SelectSingelNode below means from the current node.
        // Without it, searching starts from the document root, and 
        // not from currentVenue.
        Console.WriteLine(
            currentVenue.SelectSingleNode(@".//venueName").InnerText
        ); 
    }
    

    That should solve your problem.