Search code examples
c#xmltextxmlreader

How to check if text node of xml file has a child node inside and get all data from them?


I have xml file. This is just piece of that file

1    <mainTerm>
2      <title> Abandonment </title>
3      <see> Maltreatment </see>
4    </mainTerm>
5    <mainTerm>
6      <title> Abasia <nemod>(-astasia) (hysterical) </nemod></title>
7      <code>F44.4</code>
8    </mainTerm>

I have a lot of <mainTerm> and i loop through all of them. I copy all data of elements, but when i reach line 6, i got the problem. How to copy all that content? I need get at the end string, that will looks like "Abasia (-astasia) (hysterical)".

That's the piece of my app that work with that file

     List<string> nodes = new List<string>();

            //Create XmlReaderSettings object
            XmlReaderSettings settings = new XmlReaderSettings();
            settings.IgnoreWhitespace = true;
            settings.IgnoreComments = true;

            //Create XmlReader object
            XmlReader xmlIn = XmlReader.Create(path, settings);

            Excel.Application xlApp;
            Excel.Workbook xlWorkBook;
            Excel.Worksheet xlWorkSheet;
            object misValue = System.Reflection.Missing.Value;

            xlApp = new Excel.Application();
            xlWorkBook = xlApp.Workbooks.Add(misValue);
if (xmlIn.ReadToDescendant("mainTerm"))
{
 do
 {
   xmlIn.ReadStartElement("mainTerm");                                                  

   nodes.Add(xmlIn.ReadElementContentAsString());                          

   nodes.Add(xmlIn.ReadElementContentAsString());                          

 } while (xmlIn.ReadToNextSibling("mainTerm"));
}

Solution

  • You could use LINQ2XML. Just wrap your xml structure within a root node and fetch all title elements like this:

    var xmlSrc = @"<?xml version=""1.0"" encoding=""UTF-8""?><xml><mainTerm>
      <title> Abandonment </title>
      <see> Maltreatment </see>
    </mainTerm>
    <mainTerm>
      <title> Abasia <nemod>(-astasia) (hysterical) </nemod></title>
      <code>F44.4</code>
    </mainTerm></xml>";
    
    var xml = XDocument.Parse(xmlSrc);
    var mainTerms = xml.Root.Elements("mainTerm").ToList();
    var titles = mainTerms.Elements("title").ToList();
    foreach (var title in titles)
    {
        System.Console.WriteLine(title.Value);
    }
    

    The output is:

    Abandonment 
    Abasia (-astasia) (hysterical) 
    

    This is IMHO much easier than XPath and XmlReader.


    Using the Descendants function your mainTerm element does not need to be the root element:

    var mainTerms = xml.Root.Descendants("mainTerm").ToList();
    

    This line delivers all mainTerm's at any level from the XML document!