So i got a very large table, with about 22 mio rows in it.. i wan to load them all into a vb.net application memory (i got 36GB of RAM, so i should be okay).
now for the connection not to timeout, i wanted to ask for the best way to load that large a table.
and if there was a way to give feedback to the application, on the process, as it might take a couple of minutes to do.
i normaly use the sqldatareader.. but is that okay to use at that large amount of data?
i tryed googling a little about this problem.. my reason to load an entire table into the memory is to analyze it faster, as i need to run some regex, and sorting on it, more then TSQL offers.
i hope anyone can help, as i'm kinda stuck with this
You should use ROW_NUMBER
to partition the huge resultset into smaller chunks. Then you could report progress on every chunk. Therefore you could use a BackGroundWorker
to update a Label
and/or a ProgressBar
. To determine the size of each group you can select the total-rowcount first. Use this as divisor for the the chunksize (1000 in the following sample):
Here is a working approach that fills a DataTable
and select only small groups from the toal result using LINQ
:
On button-click start the BackGroundWorker
:
Private Sub SomeButton_Click(sender As System.Object, e As System.EventArgs) Handles Button2.Click
Me.BackgroundWorker1.RunWorkerAsync()
End Sub
Handle the DoWork
event to load the data:
Private Sub BackgroundWorker1_DoWork(sender As Object, e As System.ComponentModel.DoWorkEventArgs) Handles BackgroundWorker1.DoWork
Dim tblData As New DataTable()
Dim totalCount = 0
Dim chunkSize As Int32 = 1000
Dim countSQL = "SELECT COUNT(*) FROM dbo.tabData"
Dim dataSql = "WITH CTE AS(SELECT d.*, rn=ROW_NUMBER()OVER(ORDER BY d.idData) FROM dbo.tabData d) SELECT * FROM CTE WHERE RN BETWEEN @RowStart AND @RowEnd;"
Using con As New SqlConnection(My.Settings.ConnectionString)
Using cmdCount = New SqlCommand(countSQL, con)
con.Open()
totalCount = DirectCast(cmdCount.ExecuteScalar, Integer)
End Using
Dim chunks = Enumerable.Range(0, totalCount).
GroupBy(Function(i) i \ chunkSize).
Select(Function(grp, index) New With {
.RowStart = grp.Min() + 1,
.RowEnd = grp.Max() + 1,
.GroupNum = index + 1
})
For Each chunk In chunks
Using cmdData = New SqlCommand(dataSql, con)
cmdData.Parameters.AddWithValue("@RowStart", chunk.RowStart)
cmdData.Parameters.AddWithValue("@RowEnd", chunk.RowEnd)
Using da = New SqlDataAdapter(cmdData)
da.Fill(tblData)
BackgroundWorker1.ReportProgress(Math.Ceiling(chunk.GroupNum * chunkSize / totalCount))
End Using
End Using
Next
BackgroundWorker1.ReportProgress(100) ' all data loaded '
End Using
End Sub
Update the label and/or the ProgressBar
on every chunk and finally:
Private Sub BackgroundWorker1_ProgressChanged(sender As Object, e As System.ComponentModel.ProgressChangedEventArgs) Handles BackgroundWorker1.ProgressChanged
Me.ProgressLabel.Text = e.ProgressPercentage & " Percent loaded"
Me.ProgressBar1.Value = e.ProgressPercentage
End Sub
Private Sub BackgroundWorker1_RunWorkerCompleted(sender As Object, e As System.ComponentModel.RunWorkerCompletedEventArgs) Handles BackgroundWorker1.RunWorkerCompleted
Me.ProgressLabel.Text = "100 Percent loaded. Finished."
End Sub