Search code examples
c#xmlxml-parsingxml-database

Best way to represent n-level deep nested XMLnodes with same name?


I'm writing an XML parser in C# .Net and for this particular chunk of XML I'm having a difficult time conceptualizing how I would parse and store it's relationship in a database. See sample xml below:

<Article name="x" date="y">
    <Words>
        <Category ID="1000" Name="Person">
            <Keyword ID="1124" Name="Adult">
                <Keyword ID="1125" Name="Female" />
            </Keyword>
        </Category>
        <Category ID="1000" Name="Person">
            <Keyword ID="1124" Name="Adult">
                  <Keyword ID="1126" Name="Male" />
            </Keyword>
        </Category>
    </Words>
</Article>

So you can see there are N Categories each containing N nested keyword nodes. What is the best way to represent this relational data?

I have a master XML doc of all the possible Keyword and Category combinations. This is my first thought but please correct me if there is a better way:

  1. Table for all Keywords/Categories: ID, Name, ParentID
  2. Table for Article: ID, Name, Date,
  3. Go to bottom most keyword node in each category node and add to lookup table: ID, ArticleID, KeywordID

Then I could just do a join and recursively find all the parents of the bottom most keyword node.


Solution

  • If you want to get only bottom most keywords, then you can simply check if keyword element contains any child nodes before selecting it:

    var xdoc = XDocument.Load(path_to_xml);
    var keywords = from k in xdoc.Descendants("Keyword")
                   where !k.Elements().Any()
                   select new
                   {
                       ID = (int)k.Attribute("ID"),
                       Name = (string)k.Attribute("Name")
                   };
    

    Output:

    { ID = 1125, Name = Female }
    { ID = 1126, Name = Male }
    

    UPDATE: XPath solution to get leaf Keyword elements

    var keywords = xdoc.XPathSelectElements("//Keyword[not(*)]");