I have the following code:
for (i = 1; i <= loopsNeeded; i++)
{
lblCurrent.Text = string.Format("{0} of {1}", i, loopsNeeded);
prgWriteProgress.Value = i;
this.Refresh();
reader = XmlReader.Create(FilePath);
outputFile = CreateXmlOutputFileName(xmlFileInfo);
FileStream stream = new FileStream(outputFile, FileMode.Create);
xslArg = new XsltArgumentList();
xslArg.AddParam("Index", "", currentNode);
xslArg.AddParam("BatchSize", "", batchSize);
transformation.Transform(reader, xslArg, stream);
currentNode+=batchSize;
stream.Flush();
stream.Close();
}
This code runs through an XML file picking out a certain batch of information each time and it progressively moves through the file.
According to MSDN:
XmlReader provides forward-only, read-only access to a stream of XML data.
This gives me the issue of needing to re-define the XML reader each loop to ensure it starts at the top of the file.
Tests gave the following feedback:
125,000 information nodes/125,000 per batch file = 48 mins.
125,000 information nodes/5000 per batch file = 58 mins.
125,000 information nodes/500 per batch file = 2 hours 33 mins.
As you can see there is a heavy repercussion when running smaller batch sizes due to the requirement to constantly reload an 0.8gig file into the XMLReader.
Is there a way to avoid having to redeclare the XMLReader each time, thus reducing the overhead I encounter?
You can use the XDocument class. It provides a far nicer abstraction of a Xml Document.
var xDocument = XDocument.Load(filePath);
for (i = 0; i < loopsNeeded; i++)
{
...
var reader = xDocument.CreateReader();
...
}