Search code examples
excelopenxmldatareader

OpenXML - continue writing to next worksheet (million row hit)


According to Microsoft specifications .xlsx files have limit of 1,048,576 rows PER worksheet. In theory, as I see It, that means we could write file even with 2 million of rows - with two worksheets in same workbook.

I'm using OpenXML package with SAX approach, which (imho) is still best for writing large Excel files. I also extended my solution to write .xlsx file directly from DataReader in order to avoid any Out Of memory exception, because our users usually export very large amount of data.

That being said, I'm facing an issue when users want to export data larger than 1,048,576 rows - as .xlsx limit is (yes, they actually export that amount).

Currently they can do It with 2 steps by creating separate .xlsx files, but I'm wondering If that could be done in a single file?

For the code part: I've set a variable which checks row number (row_number) and if it hits 1 million then a new worksheet should be created, in order to continue writing data from same DataReader to the next sheet.

However I'm facing issues when creating new sheet, as my data is being written by OpenXmlWriter, which already holds a Sheetpart instance for sheet1. As I see it, maybe this would work If I could pass a reference of sheet2 to OpenXmlWriter:

int row_number = 0;

using (var Excel_doc = SpreadsheetDocument.Create(file_path, SpreadsheetDocumentType.Workbook))
{
      var workbookPart = Excel_doc.AddWorkbookPart();

      Excel_doc.WorkbookPart.Workbook = new Workbook
      {
             Sheets = new Sheets()
      };

      var sheetPart = Excel_doc.WorkbookPart.AddNewPart<WorksheetPart>();

      //Add sheet
      Sheets sheets = Excel_doc.WorkbookPart.Workbook.GetFirstChild<Sheets>();
      string relationshipId = Excel_doc.WorkbookPart.GetIdOfPart(sheetPart);

      uint sheetId = 1;
      if (sheets.Elements<Sheet>().Count() > 0)
      {
         sheetId = sheets.Elements<Sheet>().Select(s => s.SheetId.Value).Max() + 1;
      }

      Sheet sheet = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "Sheet " + sheetId };
      sheets.Append(sheet);

      using (var XML_write = OpenXmlWriter.Create(sheetPart))
      {
          XML_write.WriteStartElement(new Worksheet()); 
          XML_write.WriteStartElement(new SheetData());

          //Writing data using DataReader...
          using (OracleDataReader reader = cmd.ExecuteReader())
          {
              while (reader.Read())
              {
                 XML_write.WriteStartElement(new Row());

                 for (int i = 0; i < reader.FieldCount; i++)
                 {
                      row_number++;
                 }
                 XML_write.WriteEndElement(); //End of row

                //If 1 million row exceeded then proceed writing to next sheet - here is where I'm stucked
                if (row_number>1000000)
                {
                    sheetId +=1;   
                    Sheet sheet1 = new Sheet() { Id = relationshipId, SheetId = sheetId, Name = "List " + sheetId };
                    sheets.Append(sheet1);

                    XML_write.WriteEndElement(); 
                    XML_write.WriteEndElement(); 
                    XML_write.WriteStartElement(new Worksheet()); 
                    XML_write.WriteStartElement(new SheetData());
                    row_number=0;
                 }
              }
          }
          XML_write.WriteEndElement(); 
          XML_write.WriteEndElement(); 
          XML_write.Close();
      }
}     

While writing .xlsx file this code terminates with an error:

Token StartElement in state EndRootElement would result in an invalid XML document. Make sure that the ConformanceLevel setting is set to ConformanceLevel.Fragment or ConformanceLevel.Auto if you want to write an XML fragment

I would be more than happy If anyone has a solution to this, or a suggestion to make It work.

P.S.: Something similar already exists in some solutions - e.g. Toad for Oracle, which exports into .xls file on multiple sheets after hitting max 65k rows. So probably It can be done.


Solution

  • Seems like you need to swap the order of the loops, basically. Open your connection, then create a sheet and use it until a counter hits 1 million, then close it and create another.

    Here's some basic pseudocode.

    count = 0
    sheet = new
    writer = new writer(sheet)
    using (reader)
    {
        foreach (row in reader)
        {
            if (count % 1,000,000 == 0)
            {
                writer.close
                sheet = new
                writer = new writer(sheet)
            }
            writer.write(reader.read)
            count++
        }
    }
    writer.close