OpenXmlReader is not extracting all the cell contents from an excel .xlsx sheet
I have some code which was originally taken from SO and converted to VB Net: Using OpenXmlReader
My code loops through the first 14 rows of the worksheet without any problem but then will not find any cells further down.
reader.ElementType appears to find rows past 14 (If reader.ElementType Is GetType(Row) Then), but no more cells (GetType(Cell))
The last cell extracted is L14, but the excel file has cell contents up to L29.
Source Code:
Public Shared Function fncParseXLSXorXLSM(strFileName As String, sbTxtFromFile As StringBuilder) As StringBuilder
sbTxtFromFile.Length = 0
Dim intFirst As Integer = 1
Try
Using spreadsheetDocument__1 As SpreadsheetDocument = SpreadsheetDocument.Open(strFileName, False)
Dim workbookPart As WorkbookPart = spreadsheetDocument__1.WorkbookPart
Dim sharedStringItemsArray As SharedStringItem() = workbookPart.SharedStringTablePart.SharedStringTable.Elements(Of SharedStringItem)().ToArray()
Dim sheets As DocumentFormat.OpenXml.Spreadsheet.Sheets = spreadsheetDocument__1.WorkbookPart.Workbook.Sheets
' For each sheet, display the sheet information.
For Each sheet As DocumentFormat.OpenXml.OpenXmlElement In sheets
For Each attr As DocumentFormat.OpenXml.OpenXmlAttribute In sheet.GetAttributes()
Debug.Print("{0}: {1}", attr.LocalName, attr.Value)
If attr.LocalName = "state" And attr.Value = "hidden" Then
Debug.Print("{0}: {1}", attr.LocalName, attr.Value)
End If
Next
Next
For Each worksheetPart As WorksheetPart In workbookPart.WorksheetParts
Dim reader As OpenXmlReader = OpenXmlReader.Create(worksheetPart)
While reader.Read()
If reader.ElementType Is GetType(Row) Then
reader.ReadFirstChild()
Do
If reader.ElementType Is GetType(Cell) Then
Dim c As Cell = DirectCast(reader.LoadCurrentElement(), Cell)
Debug.Print(c.CellReference.ToString)
'********************************************************************
' Placed here to identify the last row openXmlReader seems to get to
'********************************************************************
If c.CellReference.ToString = "L14" Then
Stop
End If
'********************************************************************
' Placed here to identify the last row openXmlReader seems to get to
'********************************************************************
If c.DataType IsNot Nothing AndAlso c.DataType.Value.ToString = "SharedString" Then
'Dim ssi As SharedStringItem = workbookPart.SharedStringTablePart.SharedStringTable.Elements(Of SharedStringItem)().ElementAt(Integer.Parse(c.CellValue.InnerText))
Dim ssi As SharedStringItem = sharedStringItemsArray(Integer.Parse(c.CellValue.InnerText))
If Not ssi.Text Is Nothing Then
If Not ssi.Text.Text Is Nothing Then
If intFirst = 1 Then
sbTxtFromFile.Append(ssi.Text.Text)
intFirst = 2
Else
sbTxtFromFile.Append(Environment.NewLine & ssi.Text.Text)
End If
End If
Else
If Not ssi.InnerText Is Nothing Then
If ssi.InnerText Like "*Total of 25 CP TR Units*" Then
Stop
End If
If intFirst = 1 Then
sbTxtFromFile.Append(ssi.InnerText)
intFirst = 2
Else
sbTxtFromFile.Append(Environment.NewLine & ssi.InnerText)
End If
End If
End If
Else
If Not c.CellValue Is Nothing Then
If intFirst = 1 Then
sbTxtFromFile.Append(c.CellValue.InnerText)
intFirst = 2
Else
sbTxtFromFile.Append(Environment.NewLine & c.CellValue.InnerText)
End If
End If
End If
End If
Loop While reader.ReadNextSibling()
End If
If sbTxtFromFile.Length > 0 Then
sbTxtFromFile.Append(Environment.NewLine)
End If
End While
Next
End Using
Return sbTxtFromFile
Catch ex As Exception
If ex.Message Like "The process cannot access the file '*" Then 'File in use
sbTxtFromFile.Append("|11readonly11|")
End If
'MsgBox(ex.ToString)
End Try
End Function
Here is the link to the .xlsx file:
https://drive.google.com/file/d/0B7UpFja70T8_X1p5dzdPX3o3UzQ/view?usp=sharing
Although not quite the answer I wanted (I believe I should still be able to return all cells using my original code), I have resolved my issue by removing the outer 'IF' statement which was finding the row, reading the first child and looping through all the siblings. The code now just finds elements that are cells and returns the values.
'Previous Code
While reader.Read()
If reader.ElementType Is GetType(Row) Then
reader.ReadFirstChild()
Do
If reader.ElementType Is GetType(Cell) Then
'Code Here
End If
Loop While reader.ReadNextSibling()
End If
End While
'New Code
While reader.Read()
'If reader.ElementType Is GetType(Row) Then
'reader.ReadFirstChild()
'Do
If reader.ElementType Is GetType(Cell) Then
Do
'Code Here
Loop While reader.ReadNextSibling()
End If
'Loop While reader.ReadNextSibling()
'End If
End While