Search code examples
c#excelopenxmlsaxopenxml-sdk

Is there a SAX way to loop through OpenXML rows?


I'm parsing a large file using the SAX approach offered on: Parsing and Reading Large Excel Files with the Open XML SDK

This is my modified version (only getting the row number for simplicity)

using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("BigFile.xlsx", true))
{
    WorkbookPart workbookPart = myDoc.WorkbookPart;
    WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

    OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
    String rowNum;
    while (reader.Read())
    {
        if (reader.ElementType == typeof(Row))
        {
            if (reader.HasAttributes)
                rowNum = reader.Attributes.First(a => a.LocalName == "r").Value
        }
    }
}

The problem is that this loops through every item/cell/column/whatnot and only acts when the element type is Row.

Is there a SAX way to loop only through the rows and not every item in the worksheet?

Thanks,


Solution

  • The key is to use the Skip() and ReadNextSibling() methods of the reader...

     using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open("BigFile.xlsx", true))
     {
         WorkbookPart workbookPart = myDoc.WorkbookPart;
         WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();
         OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
         String rowNum;
         while (reader.Read())
         {
             if (reader.ElementType == typeof(Row))
             {
                 do
                 {
                     if (reader.HasAttributes)
                         rowNum = reader.Attributes.First(a => a.LocalName == "r").Value;
    
                 } while (reader.ReadNextSibling()); // Skip to the next row
                 break; // We just looped through all the rows so no need to continue reading the worksheet
             }
    
             if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
                 reader.Skip(); // Skip contents of any node before finding the first row.
         }
    }