Search code examples
c#xmlperformancexmlreader

More efficient use of an XMLReader


I have the following code:

    for (i = 1; i <= loopsNeeded; i++)
    {
        lblCurrent.Text = string.Format("{0} of {1}", i, loopsNeeded);
        prgWriteProgress.Value = i;
        this.Refresh();

        reader = XmlReader.Create(FilePath);
        outputFile = CreateXmlOutputFileName(xmlFileInfo);
        FileStream stream = new FileStream(outputFile, FileMode.Create);

        xslArg = new XsltArgumentList();
        xslArg.AddParam("Index", "", currentNode);
        xslArg.AddParam("BatchSize", "", batchSize);

        transformation.Transform(reader, xslArg, stream);

        currentNode+=batchSize;

        stream.Flush();
        stream.Close();
    }

This code runs through an XML file picking out a certain batch of information each time and it progressively moves through the file.

According to MSDN:

XmlReader provides forward-only, read-only access to a stream of XML data.

This gives me the issue of needing to re-define the XML reader each loop to ensure it starts at the top of the file.

Tests gave the following feedback:

125,000 information nodes/125,000 per batch file = 48 mins.  
125,000 information nodes/5000 per batch file = 58 mins.  
125,000 information nodes/500 per batch file = 2 hours 33 mins.

As you can see there is a heavy repercussion when running smaller batch sizes due to the requirement to constantly reload an 0.8gig file into the XMLReader.

Is there a way to avoid having to redeclare the XMLReader each time, thus reducing the overhead I encounter?


Solution

  • You can use the XDocument class. It provides a far nicer abstraction of a Xml Document.

    var xDocument = XDocument.Load(filePath);
    for (i = 0; i < loopsNeeded; i++)
    {
      ...  
      var reader = xDocument.CreateReader();
      ...
    }