Search code examples
c#t-sqlsql-server-2008oledbtask-parallel-library

ExecuteNonQuery in parallel within a shared OleDbConnection / OleDbTransaction


Discovered that OleDBConnection doesn't seem to be ThreadSafe. It appears that it attempts to open multiple connections instead.

//doesn't work
using (OleDbConnection oConn = TheDataAccessLayer.GetConnection())
using (OleDbTransaction oTran = oConn.BeginTransaction())
Parallel.ForEach(ORMObjects, (ORMObject, State) =>
{

        if (!State.ShouldExitCurrentIteration && !State.IsExceptional)
        {
              var Error = ORMObject.SomethingThatExecutesANonQuery(oConn,oTran)

              if (Error.Number != 0)
                  State.Stop();

        }

});

If I lock the connection for an ExecuteNonQuery the errors go away, but the performance tanks.

 //works
    using (OleDbConnection oConn =  TheDataAccessLayer.GetConnection())
    using (OleDbTransaction oTran = oConn.BeginTransaction())
    Parallel.ForEach(ORMObjects, (ORMObject, State) =>
    {

            if (!State.ShouldExitCurrentIteration && !State.IsExceptional)
            {
              lock(oConn)
              {
                    var Error = ORMObject.SomethingThatExecutesANonQuery(oConn,oTran)

                if (Error.Number != 0)
                      State.Stop();
             }

            }

    });

Assume that

  • I can't change the nature of the ORM: the SQL cannot be bulked

  • Business rules require that the interaction be performed within a single transaction

So:

  • Is there a more better/more efficient way to parallelize OleDb interactions?

  • If not, is there an alternative to the OleDb client that can take full advantage of parallelism? (Maybe the native MSSQL client?)


Solution

  • Transactions need to be ACID, but the "Durability" needs to be enforced only at the transaction's end. So physical IO to the disk may be postponed after the apparent SQL statement execution and actually done in the background, while your transaction is processing other statements.

    As a consequence, issuing SQL statements serially may not be much slower than issuing them concurrently. Consider this scenario:

    • Execute the SQL statement [A] that writes data. The disk is not actually touched, writes are simply queued for later, so the execution flow returns very quickly to the client (i.e. [A] does not block for long).
    • Execute the SQL statement [B] that writes data. Writes are queued and [B] does not block for long, just as before. The physical I/O of [A] may already be happening in the background at this point.
    • Other processing takes place in the transaction, while DBMS performs the physical I/O to the disk in the background.
    • The transaction is committed.
      • If queued writes are finished, there is no need to wait.
      • If queued writes are not finished by now, wait until they are. BTW, some databases can relax the "Durability" requirements to avoid this wait, but not MS SQL Server (AFAIK).

    Of course there are scenarios where this "auto-parallelism" of DBMS would not work well, for example when there is a WHERE clause that for different statements touches different partitions on different disks - DBMS would love to parallelize these clauses but can't if they are fed to it one-by-one.

    In any case, don't guess where your performance bottleneck is. Measure it instead!


    BTW, MARS will not help you in parallelizing your statements - according to MSDN: "Note, however, that MARS is defined in terms of interleaving, not in terms of parallel execution."