Search code examples
c#linq-to-xmlrobustness

Robust LINQ to XML query for sibling key-value pairs


I am just learning about LINQ to XML in all its glory and frailty, trying to hack it to do what I want to do:

Given an XML file like this -

<list>
<!-- random data, keys, values, etc.-->

  <key>FIRST_WANTED_KEY</key>
  <value>FIRST_WANTED_VALUE</value>
  
  <key>SECOND_WANTED_KEY</key>
  <value>SECOND_WANTED_VALUE</value> <!-- wanted because it's first -->

  <key>SECOND_WANTED_KEY</key>
  <value>UNWANTED_VALUE</value>  <!-- not wanted because it's second -->

  <!-- nonexistent <key>THIRD_WANTED_KEY</key> -->
  <!-- nonexistent <value>THIRD_WANTED_VALUE</value> -->

<!-- more stuff-->
</list>

I want to extract the values of a set of known "wanted keys" in a robust fashion, i.e. if SECOND_WANTED_KEY appears twice, I only want SECOND_WANTED_VALUE, not UNWANTED_VALUE. Additionally, THIRD_WANTED_KEY may or may not appear, so the query should be able to handle that as well. I can assume that FIRST_WANTED_KEY will appear before other keys, but can't assume anything about the order of the other keys - if a key appears twice, its values aren't important, I only want the first one. An anonymous data type consisting of strings is fine.

My attempt has centered around something along these lines:

var z = from y in x.Descendants()
        where y.Value == "FIRST_WANTED_KEY"
        select new
        {
          first_wanted_value = ((XElement)y.NextNode).Value,
         //...
        }

My question is what should that ... be? I've tried, for instance, (ugly, I know)

second_wanted_value = ((XElement)y.ElementsAfterSelf()
                      .Where(w => w.Value=="SECOND_WANTED_KEY")
                      .FirstOrDefault().NextNode).Value

which should hopefully allow the key to be anywhere, or non-existent, but that hasn't worked out, since .NextNode on a null XElement doesn't seem to work.

I've also tried to add in a

.Select(t => { 
    if (t==null) 
        return new XElement("SECOND_WANTED_KEY",""); 
    else return t;
})

clause in after the where, but that hasn't worked either.

I'm open to suggestions, (constructive) criticism, links, references, or suggestions of phrases to search for, etc. I've done a fair share of research already.

Edit

Let me add a layer of complexity to this - I should have included this in the first place. Let's say the XML document looks like this:

<lists>
    <list>
      <!-- as above -->
    </list>
    <list>
      <!-- as above -->
    </list>
</lists>

and I want to extract multiple sets of these key-value pairs. Question/Caution: if SECOND_WANTED_KEY doesn't appear in the first <list> element but appears in the second, I don't want to accidentally pick up the second list element's SECOND_WANTED_KEY.

Edit #2

As another idea, I've tried creating a HashSet of the keys that I'm looking for and doing this:

HashSet<string> wantedKeys = new HashSet<string>();
wantedKeys.Add("FIRST_WANTED_KEY");
//...add more keys here
var kvp = from a in x.Descendants().Where(a => wantedKeys.Contains(a.Value))
          select new KeyValuePair<string,string>(a.value,
             ((XElement)a.NextNode).Value);

This gets me all of the key-value pairs, but I'm not sure if it guarantees that I'll properly "associate" the pairs to their parent `' element. Any thoughts or comparisons between these two approaches would be helpful.

Status Update 4/9/10

As of right now I'm still mostly thinking the hash set approach is the most preferred. It seems like most of the XML processing done by .NET is done in document order - so far all of my test cases have been working out.


Solution

  • This gets the value of the first <value> element after the first <key> element containing "SECOND_WANTED_KEY":

    XDocument doc;
    
    string result = (string)doc.Root
                               .Elements("key")
                               .First(node => (string)node == "SECOND_WANTED_KEY")
                               .ElementsAfterSelf("value")
                               .First();
    

    Add null checks as desired.