Is there any way to work around OS loader lock deadlocks caused by third-party libraries?

I have an interesting problem that I haven't seen documented anywhere else (at least not this specific issue).

This issue is a combination of COM, VB6, and .NET and making them play nice.

Here's what I have:

A legacy VB6 ActiveX DLL (written by us)
A multi-threaded Windows service written in C# that processes requests from clients over the network and sends back results. It does this by creating a new STA thread to handle each request. Each request-handler thread instantiates a COM object (defined in the ActiveX DLL) to process the request and get the result (a string of XML is passed in, and it returns a string of XML back), explicitly releases the COM object, and exits. The service then sends the result back to the client.
All of the network code is handled using asynchronous networking (i.e. thread pool threads).

And yes, I know this is already a risky thing to be doing in the first place, since VB6 isn't very friendly with multi-threaded applications to begin with, but unfortunately it's what I am stuck with for the moment.

I've already fixed a number of things that were causing deadlocks in the code (for example, making sure the COM objects are actually created and called from a separate STA thread, making sure to explicitly release the COM objects before the thread exits to prevent deadlocks that were occurring between the garbage collector and the COM Interop code, etc.), but there is one deadlock scenario that I just can't seem to solve.

With some help from WinDbg, I was able to figure out what is happening, but I'm not sure how (or if) there is a way around this particular deadlock.

What's happening

If one request-handler thread is exiting, and another request-handler thread is starting at the same time, a deadlock can occur because of the way the VB6 runtime initialization and termination routines seem to work.

The deadlock occurs in the following scenario:

The new thread that is starting up is in the middle of creating a new instance of the (VB6) COM object to process an incoming request. At this point, the COM runtime is in the middle of a call to retrieve the object's class factory. The class factory implementation is in the VB6 runtime itself (MSVBVM60.dll). That is, its calling the VB6 runtime's DllGetClassObject function. This, in turn, calls an internal runtime function (MSVBVM60!CThreadPool::InitRuntime), which acquires a mutex and enters a critical section to do part of its work. At this point, it's about to call LoadLibrary to load oleaut32.dll into the process, while holding this mutex. So, now it's holding this internal VB6 runtime mutex and waiting for the OS loader lock.
The thread that is exiting is already running inside the loader lock, because it's done executing managed code and is executing inside the KERNEL32!ExitThread function. Specifically, it's in the middle of handling the DLL_THREAD_DETECH message for MSVBVM60.dll on that thread, which in turn calls a method to terminate the VB6 runtime on the thread (MSVBVM60!CThreadPool::TerminateRuntime). Now, this thread tries to acquire the same mutex that the other thread being initialized already has.

A classic deadlock. Thread A has L1 and wants L2, but Thread B has L2 and needs L1.

The problem (if you've followed me this far) is I don't have any control over what the VB6 runtime is doing in its internal thread initialization and teardown routines.

In theory, if I could force the VB6 runtime initialization code to run inside the OS loader lock, I would prevent the deadlock, because I am fairly certain the mutex the VB6 runtime is holding is specifically only used inside the initialization and termination routines.

Requirements

I can't make the COM calls from a single STA thread, because then the service won't be able to handle concurrent requests. I can't have a long-running request block other client requests either. This is why I create one STA thread per-request.
I need to create a new instance of the COM object on each thread, because I need to make sure each instance has its own copy of global variables in the VB6 code (VB6 gives each thread its own copy of all global variables).

Solutions I've tried that didn't work

Converted ActiveX DLL to ActiveX EXE

First, I tried the obvious solution and created an ActiveX EXE (out-of-process server) to handle the COM calls. Initially, I compiled it so that a new ActiveX EXE (process) was created for each incoming request, and I also tried it with the Thread Per Object compile option (one process instance is created, and it creates each object on a new thread within the ActiveX EXE).

This fixes the deadlock issue with respect to the VB6 runtime, because the VB6 runtime never gets loaded into the .NET code proper. However, this led to a different problem: if concurrent requests come into the service, the ActiveX EXE tends to fail randomly with RPC_E_SERVERFAULT errors. I assume this is because the COM marshalling and/or the VB6 runtime can't deal with concurrent object creation/destruction, or concurrent method calls, inside the ActiveX EXE.

Force the VB6 code to run inside the OS loader lock

Next, I switched back to using an ActiveX DLL for the COM class. To force the VB6 runtime to run its thread initialization code inside the OS loader lock, I created a native (Win32) C++ DLL, with code to handle DLL_THREAD_ATTACH in DllMain. The DLL_THREAD_ATTACH code calls CoInitialize and then instantiates a dummy VB6 class to force the VB6 runtime to be loaded and force the runtime initialization routine to run on the thread.

When the Windows service starts, I use LoadLibrary to load this C++ DLL into memory, so that any threads created by the service will execute that DLL's DLL_THREAD_ATTACH code.

The problem is that this code runs for every thread the service creates, including the .NET garbage collector thread and the thread-pool threads used by the async networking code, which doesn't end well (this just seems to cause the threads to never start properly, and I imagine initializing COM on the GC and thread-pool threads is in general just a very bad idea).

Addendum

I just realized why this is a bad idea (and probably part of the reason it didn't work): it isn't safe to call LoadLibrary when you are holding the loader lock. See Remarks section in this MSDN article: http://msdn.microsoft.com/en-us/library/ms682583%28VS.85%29.aspx, specifically:

Threads in DllMain hold the loader lock so no additional DLLs can be dynamically loaded or initialized.

Is there any way to workaround these issues?

So, my question is, is there any way to work around the original deadlock issue?

The only other thing I can think of is to create my own lock object and surround the code that instantiates the COM object in a .NET lock block, but then I have no way (that I know of) to put the same lock around the (operating system's) thread exit code.

Is there a more obvious solution to this issue, or am I plain out of luck here?

Solution

Since I'm still exploring my options, I wanted to still see if I could implement a solution in pure .NET code without using any native code, for the sake of simplicity. I'm not sure if this is a fool-proof solution yet, because I'm still trying to figure out whether it actually gives me the mutual exclusion I need, or if it just looks like it does.

Any thoughts or comments are welcome.

The relevant part of the code is below. Some notes:

The HandleRpcRequest method is called from a thread-pool thread when a new message is received from a remote client
This fires off a separate STA thread so that it can make the COM call safely
DbRequestProxy is a thin wrapper class around the real COM class I'm using
I used a ManualResetEvent (_safeForNewThread) to provide the mutual exclusion. The basic idea is that this event stays unsignaled (blocking other threads) if any one particular thread is about to exit (and hence potentially about to terminate the VB6 runtime). The event is only signaled again after the current thread completely terminates (after the Join call finishes). This way multiple request-handler threads can still execute concurrently unless an existing thread is exiting.

So far, I think this code is correct and guarantees that two threads can't deadlock in the VB6 runtime initialization/termination code anymore, while still allowing them to execute concurrently for most of their execution time, but I could be missing something here.

public class ClientHandler {

    private static ManualResetEvent _safeForNewThread = new ManualResetEvent(true);

    private void HandleRpcRequest(string request)
    {

        Thread rpcThread = new Thread(delegate()
        {
            DbRequestProxy dbRequest = null;

            try
            {
                Thread.BeginThreadAffinity();

                string response = null;

                // Creates a COM object. The VB6 runtime initializes itself here.
                // Other threads can be executing here at the same time without fear
                // of a deadlock, because the VB6 runtime lock is re-entrant.

                dbRequest = new DbRequestProxy();

                // Call the COM object
                response = dbRequest.ProcessDBRequest(request);

                // Send response back to client
                _messenger.Send(Messages.RpcResponse(response), true);
                }
            catch (Exception ex)
            {
                _messenger.Send(Messages.Error(ex.ToString()));
            }
            finally
            {
                if (dbRequest != null)
                {
                    // Force release of COM objects and VB6 globals
                    // to prevent a different deadlock scenario with VB6
                    // and the .NET garbage collector/finalizer threads
                    dbRequest.Dispose();
                }

                // Other request threads cannot start right now, because
                // we're exiting this thread, which will detach the VB6 runtime
                // when the underlying native thread exits

                _safeForNewThread.Reset();
                Thread.EndThreadAffinity();
            }
        });

        // Make sure we can start a new thread (i.e. another thread
        // isn't in the middle of exiting...)

        _safeForNewThread.WaitOne();

        // Put the thread into an STA, start it up, and wait for
        // it to end. If other requests come in, they'll get picked
        // up by other thread-pool threads, so we won't usually be blocking anyone
        // by doing this (although we are blocking a thread-pool thread, so
        // hopefully we don't block for *too* long).

        rpcThread.SetApartmentState(ApartmentState.STA);
        rpcThread.Start();
        rpcThread.Join();

        // Since we've joined the thread, we know at this point
        // that any DLL_THREAD_DETACH notifications have been handled
        // and that the underlying native thread has completely terminated.
        // Hence, other threads can safely be started.

        _safeForNewThread.Set();

    }
}