What my program basically does is that it searches through xml's and returns the filenames of those which have specific values in a element.
I guess I have to show you my xml first before I can continue:
<DocumentElement>
<Protocol>
<DateTime>10.03.2003</DateTime>
<Item>Date</Item>
<Value />
</Protocol>
<Protocol>
<DateTime>05.11.2020</DateTime>
<Item>Status</Item>
<Value>Ok</Value>
</Protocol>
</DocumentElement>
I have a few thousand xml files whch have this exact layout. The user can get a list of all the files with the following method:
public List<string> GetFiles(string itemValue, string element, string value)
{
return compatibleFiles.Where(path => XmlHasValue(path, itemValue, element, value)).ToList();
}
And this methods returns wether the xml has the wanted value or not:
private bool XmlHasValue(string filePath, string itemValue, string element, string value)
{
try
{
string foundValue = XDocument.Load(filePath)
.Descendants()
.Where(el => el.Name == "Item" && el.Value == itemValue)
.First()
.Parent
.Descendants()
.Where(des => des.Name == element && des.Value == value)
.First()
.Value;
return foundValue == value;
}
catch (Exception)
{
return false;
}
}
compatibleFiles
is a list with all the paths to xml files that have the correct layout/format (xml code above). The user provides the GetFiles
method the following:
itemValue
-> value the 'Item' element should have, "Status" for exampleelement
-> name of the element he want's to check (in the same 'Protocol' element), f.E. "Value" or "Date"value
-> value of the element
element, "Ok" in our exampleThe problem is, that these methods take a long time to complete, and I'm almost certain there's a better and faster way to do what I want. I don't know if GetFiles
can get any faster but XmlHasValue
sure can. Here are some test-results:
Do you guys know any faster way to do this? It would be really helpful.
UPDATE
Turns out that it was all just because of the IO thread. If you have the same problem and think your code is bad, you should first check if it's just a thread using all the cpu power.
As @Sinatr mentions. Profiling should always be the first step when investigating performance.
A reasonable guess about what takes time would be
IO could be improved by getting a faster disk, or caching results in RAM. The later may greatly improve performance if multiple searches are done, but introduces issues like cache-invalidation.
According to "What is the best way to parse (big) XML in C# Code" XmlReader is the fastest way to parse xml. This blog suggest XmlReader is about 2.5 times faster.
If you have multiple files you could also try to process multiple files in parallel. Keep in mind IO is mostly serial, so you might not gain anything unless you have a SSD that can deliver data faster than files can be processed.