Search code examples
c#.netsql-servermultithreadingthread-abort

Thread abort leaves zombie transactions and broken SqlConnection


I feel like this behavior should not be happening. Here's the scenario:

  1. Start a long-running sql transaction.

  2. The thread that ran the sql command gets aborted (not by our code!)

  3. When the thread returns to managed code, the SqlConnection's state is "Closed" - but the transaction is still open on the sql server.

  4. The SQLConnection can be re-opened, and you can try to call rollback on the transaction, but it has no effect (not that I would expect this behavior. The point is there is no way to access the transaction on the db and roll it back.)

The issue is simply that the transaction is not cleaned up properly when the thread aborts. This was a problem with .Net 1.1, 2.0 and 2.0 SP1. We are running .Net 3.5 SP1.

Here is a sample program that illustrates the issue.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

using System.Data.SqlClient;
using System.Threading;

namespace ConsoleApplication1
{
    class Run
    {
        static Thread transactionThread;

        public class ConnectionHolder : IDisposable
        {
            public void Dispose()
            {
            }

            public void executeLongTransaction()
            {
                Console.WriteLine("Starting a long running transaction.");
                using (SqlConnection _con = new SqlConnection("Data Source=<YourServer>;Initial Catalog=<YourDB>;Integrated Security=True;Persist Security Info=False;Max Pool Size=200;MultipleActiveResultSets=True;Connect Timeout=30;Application Name=ConsoleApplication1.vshost"))
                {
                    try
                    {
                        SqlTransaction trans = null;
                        trans = _con.BeginTransaction();

                        SqlCommand cmd = new SqlCommand("update <YourTable> set Name = 'XXX' where ID = @0; waitfor delay '00:00:05'", _con, trans);
                        cmd.Parameters.Add(new SqlParameter("0", 340));
                        cmd.ExecuteNonQuery();

                        cmd.Transaction.Commit();

                        Console.WriteLine("Finished the long running transaction.");
                    }
                    catch (ThreadAbortException tae)
                    {
                        Console.WriteLine("Thread - caught ThreadAbortException in executeLongTransaction - resetting.");
                        Console.WriteLine("Exception message: {0}", tae.Message);
                    }
                }
            }
        }

        static void killTransactionThread()
        {
            Thread.Sleep(2 * 1000);

            // We're not doing this anywhere in our real code.  This is for simulation
            // purposes only!
            transactionThread.Abort();

            Console.WriteLine("Killing the transaction thread...");
        }

        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main(string[] args)
        {
            using (var connectionHolder = new ConnectionHolder())
            {
                transactionThread = new Thread(connectionHolder.executeLongTransaction);
                transactionThread.Start();

                new Thread(killTransactionThread).Start();

                transactionThread.Join();

                Console.WriteLine("The transaction thread has died.  Please run 'select * from sysprocesses where open_tran > 0' now while this window remains open. \n\n");

                Console.Read();
            }
        }
    }
}

There is a Microsoft Hotfix targeted at .Net2.0 SP1 that was supposed to address this, but we obviously have newer DLL's (.Net 3.5 SP1) that don't match the version numbers listed in this hotfix.

Can anyone explain this behavior, and why the ThreadAbort is still not cleaning up the sql transaction properly? Does .Net 3.5 SP1 not include this hotfix, or is this behavior that is technically correct?


Solution

  • This is a bug in Microsoft's MARS implementation. Disabling MARS in your connection string will make the problem go away.

    If you require MARS, and are comfortable making your application dependent on another company's internal implementation, familiarize yourself with http://dotnet.sys-con.com/node/39040, break out .NET Reflector, and look at the connection and pool classes. You have to store a copy of the DbConnectionInternal property before the failure occurs. Later, use reflection to pass the reference to a deallocation method in the internal pooling class. This will stop your connection from lingering for 4:00 - 7:40 minutes.

    There are surely other ways to force the connection out of the pool and to be disposed. Short of a hotfix from Microsoft, though, reflection seems to be necessary. The public methods in the ADO.NET API don't seem to help.