Search code examples
c#xmlperformanceelementreadxml

Best/Fastest way to find values of a element in a xml file


What my program basically does is that it searches through xml's and returns the filenames of those which have specific values in a element.

I guess I have to show you my xml first before I can continue:

 <DocumentElement>
   <Protocol>
     <DateTime>10.03.2003</DateTime>
     <Item>Date</Item>
     <Value />
   </Protocol>
   <Protocol>
     <DateTime>05.11.2020</DateTime>
     <Item>Status</Item>
     <Value>Ok</Value>
   </Protocol>
 </DocumentElement>

I have a few thousand xml files whch have this exact layout. The user can get a list of all the files with the following method:

public List<string> GetFiles(string itemValue, string element, string value)
{
    return compatibleFiles.Where(path => XmlHasValue(path, itemValue, element, value)).ToList();
}

And this methods returns wether the xml has the wanted value or not:

private bool XmlHasValue(string filePath, string itemValue, string element, string value)
{
    try
    {
        string foundValue = XDocument.Load(filePath)
            .Descendants()
            .Where(el => el.Name == "Item" && el.Value == itemValue)
            .First()
            .Parent
            .Descendants()
            .Where(des => des.Name == element && des.Value == value)
            .First()
            .Value;
         return foundValue == value;
    }
    catch (Exception)
    {
        return false;
    }
}

compatibleFiles is a list with all the paths to xml files that have the correct layout/format (xml code above). The user provides the GetFiles method the following:

  • itemValue -> value the 'Item' element should have, "Status" for example
  • element -> name of the element he want's to check (in the same 'Protocol' element), f.E. "Value" or "Date"
  • value -> value of the element element, "Ok" in our example

The problem is, that these methods take a long time to complete, and I'm almost certain there's a better and faster way to do what I want. I don't know if GetFiles can get any faster but XmlHasValue sure can. Here are some test-results:

enter image description here

Do you guys know any faster way to do this? It would be really helpful.

UPDATE

Turns out that it was all just because of the IO thread. If you have the same problem and think your code is bad, you should first check if it's just a thread using all the cpu power.


Solution

  • As @Sinatr mentions. Profiling should always be the first step when investigating performance.

    A reasonable guess about what takes time would be

    1. IO
    2. Parsing

    IO could be improved by getting a faster disk, or caching results in RAM. The later may greatly improve performance if multiple searches are done, but introduces issues like cache-invalidation.

    According to "What is the best way to parse (big) XML in C# Code" XmlReader is the fastest way to parse xml. This blog suggest XmlReader is about 2.5 times faster.

    If you have multiple files you could also try to process multiple files in parallel. Keep in mind IO is mostly serial, so you might not gain anything unless you have a SSD that can deliver data faster than files can be processed.