Search code examples
.netvb.netoledb

Import range of rows / batches from CSV with OleDB


I want to fetch dynamically selected rows with specific range with ole db, thats the code that i wrote:

Dim ConStr As String = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & TextBox1.Text & ";Extended Properties=""TEXT;HDR=Yes;FMT=Delimited"""
        Dim conn As New OleDb.OleDbConnection(ConStr)
        Dim dt As New DataTable
            Try
            Dim CMD As String = "Select * from " & _table & ".csv WHERE ID BETWEEN " & min & "AND " & max
            Dim da As New OleDb.OleDbDataAdapter(CMD, conn)
            da.Fill(dt)

it runs in a for loop with max and min and fetches rows step for step and passes them to another function, but the problem is that the code above doesn't really work because it looks after ID but it should actually look after rows. I mean there are 70.000 rows but 200.000 ids. how can i select rows in oledb?


Solution

  • Fundamentally, Ids do not really have any relation to row count except by accident. There is an overload for the DataAdapter.Fill() method which lets you specify the range of rows loaded. I double checked to be sure it works with the OleDB Text Driver.

    Private OLECSVConnstr = ...your text driver connection string
    Private firstRow As Int32 = 0
    Private rowCount As Int32 = 1000
    ...
    Private Function ImportRows(csvFile As String) As Int32
    
        Dim SQL = String.Format("SELECT * FROM {0}", csvFile)
        Dim rows As Int32
    
        Using dbcon As New OleDbConnection(OLECSVConnstr)
            Using cmd As New OleDbCommand(SQL, dbcon)
    
                ' DataTable is IDiposable, so dont just
                '   (re)create a new one each time
                If dtSample Is Nothing Then
                    dtSample = New DataTable
                Else
                    dtSample.Rows.Clear()
                End If
    
                Using da As New OleDbDataAdapter(cmd)
                    dbcon.Open()
                    rows = da.Fill(firstRow, rowCount, dtSample)
                End Using
    
                ' increment firstrow for next time
                firstRow += rowCount
    
            End Using
        End Using
    
        dgv2.DataSource = dtSample
        Return rows
    
    End Function
    

    Edit: Since DataTable implements IDisposable creating a new one for each "batch"/row set may result in a leak. The above checks and creates a new one if needed and clears the rows otherwise. Alternatively, you could create the DataAdapter once only.

    DataAdapter.Fill(int first, int count, DataTable dt) allows you to tell the adapter the first row to load and how many. The ImportRows method above returns how many rows were actually loaded so when it returns fewer than the number requested, it should mean there are no more rows.


    With such a big file, I would use a Schema.INI with the OleDB Text Driver so you can specify the data types used for each row. Without it, the Text Driver determines data types (guesses) from the data in the first few rows.

    I should mention that CSVHelper could also be used to load rows in batches as well. It reads the CSV a line at a time as you call Read(), so all you'd have to do is maintain a reference to the CSV reader and call it in a loop to get N rows at a time.

    DBDataAdapter.Fill on MSDN