Search code examples
vb.netsql-server-2012large-data

Loading a very large table from mssql to vb.net memory


So i got a very large table, with about 22 mio rows in it.. i wan to load them all into a vb.net application memory (i got 36GB of RAM, so i should be okay).

now for the connection not to timeout, i wanted to ask for the best way to load that large a table.

and if there was a way to give feedback to the application, on the process, as it might take a couple of minutes to do.

i normaly use the sqldatareader.. but is that okay to use at that large amount of data?

i tryed googling a little about this problem.. my reason to load an entire table into the memory is to analyze it faster, as i need to run some regex, and sorting on it, more then TSQL offers.

i hope anyone can help, as i'm kinda stuck with this


Solution

  • You should use ROW_NUMBER to partition the huge resultset into smaller chunks. Then you could report progress on every chunk. Therefore you could use a BackGroundWorker to update a Label and/or a ProgressBar. To determine the size of each group you can select the total-rowcount first. Use this as divisor for the the chunksize (1000 in the following sample):

    Here is a working approach that fills a DataTable and select only small groups from the toal result using LINQ:

    On button-click start the BackGroundWorker:

    Private Sub SomeButton_Click(sender As System.Object, e As System.EventArgs) Handles Button2.Click
        Me.BackgroundWorker1.RunWorkerAsync()
    End Sub
    

    Handle the DoWork event to load the data:

    Private Sub BackgroundWorker1_DoWork(sender As Object, e As System.ComponentModel.DoWorkEventArgs) Handles BackgroundWorker1.DoWork
        Dim tblData As New DataTable()
        Dim totalCount = 0
        Dim chunkSize As Int32 = 1000
        Dim countSQL = "SELECT COUNT(*) FROM dbo.tabData"
        Dim dataSql = "WITH CTE AS(SELECT d.*, rn=ROW_NUMBER()OVER(ORDER BY d.idData) FROM dbo.tabData d) SELECT * FROM CTE WHERE RN BETWEEN @RowStart AND @RowEnd;"
        Using con As New SqlConnection(My.Settings.ConnectionString)
            Using cmdCount = New SqlCommand(countSQL, con)
                con.Open()
                totalCount = DirectCast(cmdCount.ExecuteScalar, Integer)
            End Using
            Dim chunks = Enumerable.Range(0, totalCount).
                GroupBy(Function(i) i \ chunkSize).
                Select(Function(grp, index) New With {
                           .RowStart = grp.Min() + 1,
                           .RowEnd = grp.Max() + 1,
                           .GroupNum = index + 1
                       })
            For Each chunk In chunks
                Using cmdData = New SqlCommand(dataSql, con)
                    cmdData.Parameters.AddWithValue("@RowStart", chunk.RowStart)
                    cmdData.Parameters.AddWithValue("@RowEnd", chunk.RowEnd)
                    Using da = New SqlDataAdapter(cmdData)
                        da.Fill(tblData)
                        BackgroundWorker1.ReportProgress(Math.Ceiling(chunk.GroupNum * chunkSize / totalCount))
                    End Using
                End Using
            Next
            BackgroundWorker1.ReportProgress(100)  ' all data loaded '
        End Using
    End Sub
    

    Update the label and/or the ProgressBar on every chunk and finally:

    Private Sub BackgroundWorker1_ProgressChanged(sender As Object, e As System.ComponentModel.ProgressChangedEventArgs) Handles BackgroundWorker1.ProgressChanged
        Me.ProgressLabel.Text = e.ProgressPercentage & " Percent loaded"
        Me.ProgressBar1.Value = e.ProgressPercentage
    End Sub
    
    Private Sub BackgroundWorker1_RunWorkerCompleted(sender As Object, e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles BackgroundWorker1.RunWorkerCompleted
        Me.ProgressLabel.Text = "100 Percent loaded. Finished."
    End Sub