Search code examples
sqlvb.netexceloledb

Reading very large data from an Excel 2013 file using OleDB range error


I am trying to read an Excel 2013 file (.xlsx which is about 100 MB in size) using Visual Basic.NET with the help of OleDB. The main concern was getting a system out of memory exception in the line:

da.Fill(dt)

from the code below.

Private Function ReadExcelFile() As DataSet
    Dim ds As New DataSet()

    Dim connectionString As String =
    "Provider=Microsoft.ACE.OLEDB.12.0;;Extended Properties=Excel 12.0 XML;Data Source=C:\file.xlsx;"

    Using connection As New OleDbConnection(connectionString)
        connection.Open()
        Dim cmd As New OleDbCommand()
        cmd.Connection = connection
        Dim dtSheet As DataTable = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)

        For Each dr As DataRow In dtSheet.Rows
            Dim sheetName As String = dr("TABLE_NAME").ToString()
            If Not sheetName.EndsWith("$") Then
                Continue For
            End If

            cmd.CommandText = "SELECT * FROM [" & sheetName & "];"
            Dim dt As New DataTable()
            dt.TableName = sheetName
            Dim da As New OleDbDataAdapter(cmd)
            da.Fill(dt)
            ds.Tables.Add(dt)
        Next

        cmd = Nothing
        connection.Close()
    End Using
    Return ds
End Function

but I think the best solution is to read the data by chunks so I found out that I could read the data by adding a column range in the SQL statement like this:

 cmd.CommandText = "SELECT * FROM [" & sheetName & "B1:B10];"

I did a loop by doing increments on that range but I found an error. With this for example,

cmd.CommandText = "SELECT * FROM [" & sheetName & "B50000:B51000];"

It still works. However, if I do,

cmd.CommandText = "SELECT * FROM [" & sheetName & "B70000:B70001];"

I get this error.

OleDb Error

Note that the Excel file has 475128 rows and B70000-B70001 isn't even half the total.

Total Columns

Could anyone shed some light? I think I'm missing something here.


Solution

  • I found a working solution. Instead of using DataSet, use DataReader. I can just add a worker so it won't hang up.

     Private Function ReadExcelFile() As DataSet
        Dim ds As New DataSet()
    
        Dim connectionString As String = GetConnectionString()
    
        Using connection As New OleDbConnection(connectionString)
            connection.Open()
            Dim cmd As New OleDbCommand()
            cmd.Connection = connection
            Dim dtSheet As DataTable = connection.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, Nothing)
    
            For Each dr As DataRow In dtSheet.Rows
                Dim sheetName As String = dr("TABLE_NAME").ToString()
                If Not sheetName.EndsWith("$") Then
                    Continue For
                End If
                cmd.CommandText = "SELECT * FROM [" & sheetName & "];"
                Dim ddr As OleDbDataReader = cmd.ExecuteReader()
                Dim counter As Integer = 0
                While (ddr.Read())
                    MessageBox.Show(ddr.GetValue(0))
                End While
            Next
            cmd = Nothing
            connection.Close()
        End Using
        Return ds
    End Function
    

    The lines:

    Dim ddr As OleDbDataReader = cmd.ExecuteReader()
    Dim counter As Integer = 0
    While (ddr.Read())
         MessageBox.Show(ddr.GetValue(0))
    End While
    

    are the essential code where you can access the rows of the first column (index 0). This works because I read that DataSet is an in-memory object (Which is the reason why we possible can get a system out of memory exception) - Check here for reference

    I'd still want to hear about why my above problem is popping up.