Search code examples
vb.netmultithreadingtask-parallel-librarysql-server-ce

VB.net Multithreading with many database calls


I have a vb.net application in which users must process several million records to store in a database (sql CE). The processing goes as follows:

  1. Retrieve some saved data from the database to instantiate an object.
  2. Perform some calculations.
  3. Save the calculated object data to a different table in the database.

If done sequentially, the first two items take about 30% of the time, and the last item takes about 70% of the time.

I thought it might be beneficial to do most of the database writes more like a batch process on a separate thread, hoping to cut the cost by (in an ideal world) the 30% required to do items 1 and 2. I tried doing this by storing the processed objects in a list, and whenever the list count exceeds a certain number calling an action on a separate thread, I save the data. Each time the data is saved, I actually save the data for the main object and several related child objects, i.e.,

cmd.CommandText = "INSERT INTO [MainObjectTable] (Data1, Data2, Data3) VALUES ( @Data1, @Data2, @Data3)"
cmd.ExecuteNonQuery()

cmd2.CommandText = "SELECT MAX(idx) FROM [MainObjectTable]"
MainObjectIdx = CInt(cmd2.ExecuteScalar())

'Loop over child objects and save each one to database'
cmd3.CommandText = "INSERT INTO [ChildObject] (MainObjectIdx, Data4, Data5, Data6) VALUES ( MainObjectIdx, @Data4, @Data5, @Data6 )"

for i = 0 to ChildList.Count -1
     [Code to Update parameters for the command object]
     cmd3.ExecuteNonQuery()
next

I lock the database to prevent multiple records from trying to save at once. I must do this (I think) because I use the main object's record key for further processing of a child object's data, as shown above.

The code essentially looks something like this:

Private sub CalcData()
    Dim objectList as new List(of ParentObject)
    For k = 0 to Calculations
       'Load data, instantiate object'
       Dim o as new ParentObject(IDs(k)) '<-- This pulls records from the sql ce database and loads the object'
       o.calcData  'calculate the new data based on loaded data'
       objectList.add(o) 'store in a list'
       If objectList.Count > 1000 then
            Batchsave(objectList)
            objectList.clear()
       End If
    Next
End Sub

Private Sub BatchSave(objList As List(of ParentObject))                                          
    mTaskList.Add(Tasks.Task.Factory.StartNew(
            Sub()
                DBLock.EnterWriteLock()
                Try
                  for j = 0 to objectList.count-1
                    [Code to update command object parameters, and save the object (and children) as discussed above where I show the sql statements]
                  next
                Finally
                  DBLock.ExitWriteLock()
                End Try
            End Sub))              
End Sub

I thought this scheme would maximize the performance, allowing the data saving to be done on a background thread. I structured the save like a batch process (for 1000 records at a time) because I have read that it's more efficient to parameterize the sql when updating many records. But the time reduction was not terribly impressive.

I also tried creating a new "Save" class to which I passed the data to be saved as it becomes available. The "Save" class handles creating a new tasks.task each time a parent object is passed to it, so I thought this would more or less create a continuous stream of objects to save on other threads rather than relying on saving every 1000 objects. Inside the "Save" class I have the following:

Public Class SaveData

Public Sub SaveBDLItem(ByVal o As ParentObject)

    Tasks.Task.Factory.StartNew(
            Sub()
                Dim Object   
                mParentLock.EnterWriteLock()
                Try
                    mcmd1.Parameters.Clear()

                    [code to add parameters to command object]

                    mcmd1.ExecuteNonQuery()
                    'get new key '
                    objectIDx= CInt(mcmd2.ExecuteScalar())
                Finally
                    mBDLLock.ExitWriteLock()
                End Try

                'Now update children'
                mChildLock.EnterWriteLock()
                Try
                    For j = 0 To ParentObject.Children.Count - 1
                        mcmd3.Parameters.Clear()

                        [code to add parameters to command object]

                        mcmd3.ExecuteNonQuery()

                    Next
                Finally
                    mChildLock.ExitWriteLock()
                End Try

            End Sub))

End Sub
.
.
.
End Class

However, this implementation was much slower than my previous attempt, and in fact seemed to be operating synchronously. Any idea why this method is actually slower?

I would also love feedback if there are other ideas of how to speed the whole process up.


Solution

  • Use the TableDirect APIs to avoid the overhead of the query processor. For Inserts and Selects, you will see tremendous speed gains. And remember the a sql ce database is a single file, so it is limited jow much you can do in parallel. I have a number of recent blog posts with deatils on how to use the TableDirect apis