I'm building an app in VB.NET to read rows from an Excel file and populate them into a DataTable
.
dtRow = dataTable.NewRow()
Dim startTime As DateTime = DateTime.Now
dtRow("name") = suppliers.CellValue("A", rowCount)
/* SNIP - just more string retrieval */
dtRow("statistics") = suppliers.CellValue("P", rowCount)
dataTable.Rows.Add(dtRow)
Dim endTime As DateTime = DateTime.Now
Debug.Print(String.Format("Time elapsed to retrieve '{0}': {1} ms", rowCount, (endTime - startTime).ToString("fffffff")))
CellValue
is my own creation- but it is a small function and I've already measured its elapsed time. It's quite fast.
However, when I open a 10,000 row Excel file (filled with identical data), the time to process gets much slower.
3,000 rows:
Time elapsed to retrieve '2': 0510051 ms
Time elapsed to retrieve '3': 0500050 ms
Time elapsed to retrieve '4': 0340034 ms
Time elapsed to retrieve '5': 0350035 ms
Time elapsed to retrieve '6': 0340034 ms
Time elapsed to retrieve '7': 0340034 ms
Time elapsed to retrieve '8': 0350035 ms
6,000 rows:
Time elapsed to retrieve '2': 0710071 ms
Time elapsed to retrieve '3': 0760076 ms
Time elapsed to retrieve '4': 0620062 ms
Time elapsed to retrieve '5': 0670067 ms
Time elapsed to retrieve '6': 0750075 ms
Time elapsed to retrieve '7': 0750075 ms
Time elapsed to retrieve '8': 0700070 ms
10,000 rows:
Time elapsed to retrieve '2': 0920092 ms
Time elapsed to retrieve '3': 0920092 ms
Time elapsed to retrieve '4': 1790179 ms
Time elapsed to retrieve '5': 1810181 ms
Time elapsed to retrieve '6': 1930193 ms
Time elapsed to retrieve '7': 2240224 ms
Time elapsed to retrieve '8': 1820182 ms
Why does this occur? Can I fix it?
EDIT: suppliers
is a class I have created to handle the Excel file, with this constructor:
Public Sub New(ByVal doc As SpreadsheetDocument, ByVal sheetName As String)
pWorkbookPart = doc.WorkbookPart
Dim sheet As Sheet = pWorkbookPart.Workbook.Descendants(Of Sheet).Where(Function(s) s.Name = sheetName).FirstOrDefault()
pWorksheetPart = CType(pWorkbookPart.GetPartById(sheet.Id), WorksheetPart)
pSharedStringTable = pWorkbookPart.GetPartsOfType(Of SharedStringTablePart).FirstOrDefault()
End Sub
CellValue
:
Public Function CellValue(ByVal column As String, ByVal row As Integer) As String
Dim cellAddress As String = column & row
Dim cell As Cell = pWorksheetPart.Worksheet.Descendants(Of Cell).Where(Function(c) c.CellReference = cellAddress).FirstOrDefault()
Dim index As Integer
Dim returnValue As String
If cell IsNot Nothing Then
If cell.DataType IsNot Nothing Then
index = Integer.Parse(cell.InnerText)
returnValue = pSharedStringTable.SharedStringTable.ElementAt(index).InnerText
Else
returnValue = CStr(cell.InnerText)
End If
End If
Return returnValue
End Function
This line looks suspcious:
Dim cell As Cell = pWorksheetPart.Worksheet.Descendants(Of Cell).Where(Function(c) c.CellReference = cellAddress).FirstOrDefault()
If the .Where() condition executes against every cell in the spreadsheet. As the number of rows grows, the number of cell address comparisons grows by (rows x columns). Even if the cell reference comparison operation is trivially simple, it adds up fast.
If the OpenXML or the Workbook class shown do not provide convenient x,y cell addressing, you may have to create your own indexes. Make one pass over all the cells to add them to your own column lists, then you can index by x,y with abandon. x = index of the column list from the list of columns, y = index into the column list to fetch the cell.