Search code examples
.netopenxmlxlsxclosedxml

Export big amount of data from XLSX - OutOfMemoryException


I am approaching to export a big amount of data (115.000 rows x 30 columnd) in Excel OpenXML Format (xlsx). I am using some libraries like DocumentFormat.OpenXML, ClosedXML, NPOI.

With each of this, OutOfMemoryException is thrown because the representation of the sheet in the memory causes an exponential memory increase.

Also closing the document file every 1000rows (and releasing memory), the next loading causes memory increase.

Is there a more performant way to export data in xlsx without occupy a lot of memory?


Solution

  • The OpenXML SDK is the right tool for this job but you need to be careful to use the SAX (Simple API for XML) approach rather than the DOM approach. From the linked wikipedia article for SAX:

    Where the DOM operates on the document as a whole, SAX parsers operate on each piece of the XML document sequentially

    This vastly reduces the amount of memory consumed when handling large Excel files.

    There's a good article on it here - http://polymathprogrammer.com/2012/08/06/how-to-properly-use-openxmlwriter-to-write-large-excel-files/

    Adapted from that article, here's an example that outputs 115k rows with 30 columns:

    public static void LargeExport(string filename)
    {
        using (SpreadsheetDocument document = SpreadsheetDocument.Create(filename, SpreadsheetDocumentType.Workbook))
        {
            //this list of attributes will be used when writing a start element
            List<OpenXmlAttribute> attributes;
            OpenXmlWriter writer;
    
            document.AddWorkbookPart();
            WorksheetPart workSheetPart = document.WorkbookPart.AddNewPart<WorksheetPart>();
    
            writer = OpenXmlWriter.Create(workSheetPart);            
            writer.WriteStartElement(new Worksheet());
            writer.WriteStartElement(new SheetData());
    
            for (int rowNum = 1; rowNum <= 115000; ++rowNum)
            {
                //create a new list of attributes
                attributes = new List<OpenXmlAttribute>();
                // add the row index attribute to the list
                attributes.Add(new OpenXmlAttribute("r", null, rowNum.ToString()));
    
                //write the row start element with the row index attribute
                writer.WriteStartElement(new Row(), attributes);
    
                for (int columnNum = 1; columnNum <= 30; ++columnNum)
                {
                    //reset the list of attributes
                    attributes = new List<OpenXmlAttribute>();
                    // add data type attribute - in this case inline string (you might want to look at the shared strings table)
                    attributes.Add(new OpenXmlAttribute("t", null, "str"));
                    //add the cell reference attribute
                    attributes.Add(new OpenXmlAttribute("r", "", string.Format("{0}{1}", GetColumnName(columnNum), rowNum)));
    
                    //write the cell start element with the type and reference attributes
                    writer.WriteStartElement(new Cell(), attributes);
                    //write the cell value
                    writer.WriteElement(new CellValue(string.Format("This is Row {0}, Cell {1}", rowNum, columnNum)));
    
                    // write the end cell element
                    writer.WriteEndElement();
                }
    
                // write the end row element
                writer.WriteEndElement();
            }
    
            // write the end SheetData element
            writer.WriteEndElement();
            // write the end Worksheet element
            writer.WriteEndElement();
            writer.Close();
    
            writer = OpenXmlWriter.Create(document.WorkbookPart);
            writer.WriteStartElement(new Workbook());
            writer.WriteStartElement(new Sheets());
    
            writer.WriteElement(new Sheet()
            {
                Name = "Large Sheet",
                SheetId = 1,
                Id = document.WorkbookPart.GetIdOfPart(workSheetPart)
            });
    
            // End Sheets
            writer.WriteEndElement();
            // End Workbook
            writer.WriteEndElement();
    
            writer.Close();
    
            document.Close();
        }
    }
    
    //A simple helper to get the column name from the column index. This is not well tested!
    private static string GetColumnName(int columnIndex)
    {
        int dividend = columnIndex;
        string columnName = String.Empty;
        int modifier;
    
        while (dividend > 0)
        {
            modifier = (dividend - 1) % 26;
            columnName = Convert.ToChar(65 + modifier).ToString() + columnName;
            dividend = (int)((dividend - modifier) / 26);
        }
    
        return columnName;
    }