Reverse P/Invoke (also) Managed Callback to Unmanaged Code

The included C# unit test and C code file attempts to pass a managed callback to unmanaged code. The code actually runs but the count variable never increments. So the test fails.

The fact that it runs at all means that it does load the dll, finds the reference for the DoCallBack() method, and it seems to call the method. But nothing happens. So something is off.

You probably want to know why attempt to do this? And you wonder if there's a better approach? Well, the end goal is to create a "hack" so as run threads across AppDomains at nearly the same performance as in the same domain.

At the following link you'll find the faster, so far, technique on cross AppDomain performance. The MS .Net AddIn team offers "FastPath" which improves alot over simple remoting performance. We ran their example on .Net 3.5 and it works very fast after putting their AddIn contracts into the GAC.

http://blogs.msdn.com/b/clraddins/archive/2008/02/22/add-in-performance-what-can-you-expect-as-you-cross-an-isolation-boundary-and-how-to-make-it-better-jesse-kaplan.aspx

Now let's discuss some of the timing coparisons to see why that's still not fast enough for our needs. Normal cross domain remoting offers approximately 10,000 calls per second on a method with zero arguments. With the FastPath option that increases to 200,000 calls per second. But comparing that to the C# of calling an interface method with zero arguments (in the same domain), it does over 160,000,000 operations per second on the same machine as the other tests.

So even the FastPath technique is still 1,000 times slower than simple interface method call. But why do we need better performance?

Or performance requirements are to remove all software bottlenecks from a CPU bound application that process billions of tuples of information in mere minutes using multicore and distributed technology.

But a new feature requirement will be the ability to offer an AddIn, or Plugin architure so that components can be loaded or unloaded w/o stopping the rest of the system. The only way to do that effectively on .Net is with separate AppDomains.

Please note that we don't want to communicate data across AppDomains, they all operate independently in parallel.

But as far as threading, it's very inefficient to have a separate thread running in each of hundreds of AppDomains. If so they compete for CPU and cause huge loss of performance from context switching.

So again, the plan here is to have a primary or master domain which has a thread pool and takes turns calling into each AppDomain that has work to do, and let it work a while. So that means cooperative multi-threading (to avoid context switching). Therefore, the AppDomains will return to allow the main AppDomain to move on to others.

Unfortunately each AppDomain can't run very long independently before it runs out of work and needs to return to the master domain to let a different AppDomain do work..so the performance times of 200,000 per second from the FastPath technique will cause a significant slow down in overall performance due to cross AppDomain calls.

In contrast with PInvoke below we have measured the timing with StopWatch to produce over 90,000,000-- that's 90 million-- calls per second on the same machine as the other tests. So the hope is that by then reverse P/Invoking into a different AppDomain it will still allow for many millions of operates per second.

90 million per second is much closer to our need for switching threads between AppDomains.

Okay. Now back to this unit test. The purpose of this simply unit test is to first get simple call backs from unmanaged to managed code working....after that, the next step will be to create a separate AppDomain and get a delegate callback to it and pass to the unmanaged code to test the cross domain callback performance.

We know that all this is possible we see discussion and examples on the web...but while the code below seems simple...it's just not working as expected.

Here's the unmanaged code built as a DLL w/o the /CLR command line option:

#include <windows.h>

typedef void (CALLBACK *PFN_MYCALLBACK)();
int count = 0;

extern "C" {
    __declspec(dllexport) void __stdcall DoSomeStuff() {
        ++count;
    }
}

extern "C" {
    __declspec(dllexport) void __stdcall DoCallBack(PFN_MYCALLBACK callback) {
        PFN_MYCALLBACK();
    }
}

Here's the C# unit test code.

using System.Runtime.InteropServices;
using System.Security;
using NUnit.Framework;

namespace TickZoom.Callback
{
    [SuppressUnmanagedCodeSecurity]
    [UnmanagedFunctionPointer(CallingConvention.Cdecl)]
    public delegate void MyCallback();

    [TestFixture]
    class CrossAppDomainTesting
    {
        public int count;
        [Test]
        public void TestCallback()
        {
            NativeMethods.DoCallBack(
                delegate()
                {
                    ++count;
                });
            Assert.AreEqual(1, count);
        }

        [Test]
        public void TestDate()
        {
            NativeMethods.DoSomeStuff();
        }
    }

    public static class NativeMethods
    {
        [SuppressUnmanagedCodeSecurity]
        [DllImport("CrossAppDomain.dll")]
        public static extern void DoSomeStuff();

        [SuppressUnmanagedCodeSecurity]
        [DllImport("CrossAppDomain.dll")]
        public static extern void DoCallBack(MyCallback callback);

    }
}

Solution

The first commenter to the post solved the coding problem. I waited for him to post it as an answer to give him credit. If he does, then I will.

Is that just a typo in your DoCallback function? You have PFN_MYCALLBACK(). I think you want it to be callback(). – Jim Mischel

Also, the result of the timings so that the fastest way possible to call from one AppDomain across to another AppDomain is the following:

You first call an unmanaged method to send over a delegate to unmanaged code which gets marshaled to a function pointer.

From that point on you call the unmanaged code w/o any arguments but reuse the function pointer to call into the other AppDomain.

Our testing shows that this works at 90 million calls per second as compared to 300 million per second of a simply C# call on an interface or a Interlocked.Increment() that is 80 million per second.

In other words, this is fast enough to happen quite often to transition threads across AppDomain boundaries.

NOTE: There are a couple things to be careful with. If you keep pointers to AppDomains that you unload and then try to call them, you'll get an exception about a collected delegate. The reasons for this is that the function pointer that the CLR gives you isn't a simply function pointer into the code. Instead it's a pointer to a piece of "thunking" code which first checks that the delegate is still around, and it does a little housekeeping for the transition from unmanaged to managed code.

Our plan is to assign each AppDomain an integer handle. Then the unmanaged code will get both the "handle" and the function pointer to put into an array.

When we unload an AppDomain we'll also inform the unmanaged code to remove the function pointer for that handle. We'll keep the freed 'handle' on a freed list to reuse for the next AppDomain created.