I have an XML file in this format (but only much bigger) :
<customer>
<name>John</name>
<age>24</age>
<gender>M</gender>
</customer>
<customer>
<name>Keith</name>
<age></age> <!--blank value-->
<gender>M</gender>
</customer>
<customer>
<name>Jenny</name>
<age>21</age>
<gender>F</gender>
</customer>
<customer>
<name>John</name>
<age>24</age> <!--blank value-->
<gender>M</gender> <!--blank value-->
</customer>
I want to generate a DataTable which will be in this format :
Element Name Value Frequency
name filled 4
name blank 0
age filled 2
age blank 2
gender filled 3
gender blank 1
Currently I am completing this task in 2 parts, first creating a DataTable structure as above and setting all the frequencies to 0 as default. And then reading the XML using XmlReader, and increasing the count of the frequency everytime XmlReader finds a child element.
My problem is that the second function that I use for adding the actual count is taking too long for very big Xml files with many customers having many attributes. How can I improve the efficiency of this function?
My code :
static void AddCount(DataTable dt)
{
int count;
using (XmlReader reader = XmlReader.Create(@"C:\Usr\sample.xml"))
{
while (reader.Read())
{
if (reader.IsStartElement())
{
string eleName = reader.Name;
DataRow[] foundElements = dt.Select("ElementName = '" + eleName + "'");
if (!reader.IsEmptyElement)
{
count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());
foundElements.ElementAt(0).SetField("Frequency", count + 1);
}
else
{
count = int.Parse(foundElements.ElementAt(0)["Frequency"].ToString());
foundElements.ElementAt(0).SetField("Frequency", count + 1);
}
}
}
}
}
I am also ready to change the XmlReader class for any other more efficient class for this task. Any advice is welcome.
It turned out that querying in the DataTable using the Select operation was very expensive and that was making my function very slow.
Instead of that, used a Dictionary<string, ValueFrequencyModel>
and queried on that to fill the dictionary with the count, and after completing that, converted the Dictionary<string, ValueFrequencyModel>
into a DataTable.
This saved loads of time for me and solved the problem.