I'm writing an XML parser in C# .Net and for this particular chunk of XML I'm having a difficult time conceptualizing how I would parse and store it's relationship in a database. See sample xml below:
<Article name="x" date="y">
<Words>
<Category ID="1000" Name="Person">
<Keyword ID="1124" Name="Adult">
<Keyword ID="1125" Name="Female" />
</Keyword>
</Category>
<Category ID="1000" Name="Person">
<Keyword ID="1124" Name="Adult">
<Keyword ID="1126" Name="Male" />
</Keyword>
</Category>
</Words>
</Article>
So you can see there are N Categories each containing N nested keyword nodes. What is the best way to represent this relational data?
I have a master XML doc of all the possible Keyword and Category combinations. This is my first thought but please correct me if there is a better way:
Then I could just do a join and recursively find all the parents of the bottom most keyword node.
If you want to get only bottom most keywords, then you can simply check if keyword element contains any child nodes before selecting it:
var xdoc = XDocument.Load(path_to_xml);
var keywords = from k in xdoc.Descendants("Keyword")
where !k.Elements().Any()
select new
{
ID = (int)k.Attribute("ID"),
Name = (string)k.Attribute("Name")
};
Output:
{ ID = 1125, Name = Female }
{ ID = 1126, Name = Male }
UPDATE: XPath solution to get leaf Keyword elements
var keywords = xdoc.XPathSelectElements("//Keyword[not(*)]");