Search code examples
c++qtboostdllsignals2

Access violation when using boost::signals2 and unloading DLLs


I'm trying to figure out this problem.

Suppose, you have a code that uses boost::signals2 for communicating between objects. Lets call them "colorscales". Code for these colorscales is usually situated in the same DLL as the code that uses them. Let's call it main.dll

But sometimes code from other DLLs needs to use these objects and this is where the problems begin.

Basically, the application is pretty big and most of the DLLs are loaded to do some work and then they are unloaded. This is not the case with DLL that contain colorscales code, it's neved unloaded during application normal runtime.

So, when one of the DLLs is loaded (lets call it tools.dll) and some code runs, it may want to use these colorscale objects and communicate with them, so I connect to the signals these objects provide.

The problem is that boost is pretty lazy and all clever, and when you disconnect() slots, it doesn't actually erase connection and stuff that is associated with it (like boost::bind object and such). It just sets a flag that this connection is now disconnected and cleans it up on later (actually, it clean up 2 of these objects when you connect new slots and 1 of them when you invoke signal as of version 1.57). You probably already see where this is coming to.

So, you when you don't need more tools, you disconnect these signals and then application unloads tools.dll.

Then at a later stage, some code executes from the main.dll which causes one of colorscale signals invoked. boost::signals2 goes to invoke it, but before it tries to clean up one disconnected slot. This is where access violation happens, because internally connection had a shared_state object or something like this, that tries to clean itself up in a thread-safe way. But it faces the problem, that the code that it tries to call is already not there, because DLL is unloaded, so the Access Violation exception is thrown.

I've tried to fix this by invoking signal with some dummy parameters before DLL is unloaded and also by connecting and then disconnecting more slots (this one was a stupid idea, because it doesn't solve problem, but just multiplies it) some predefined amount of times (2 or 3 times more than there are slots at all).

It worked, or I thought so, because now it doesn't crash instantly, but rather crashes the next time you load the same tools.dll. I still need to figure out where and why does it crash, but it's somewhere else inside boost.

So, I wanted to ask, what are my options of fixing it? My thoughts were

  • Implementing my own connection that works in a more simple way
  • Providing a more simple way to communicate, like, callbacks, for instance
  • Finding a workaround for boost being so lazy and smart.

Solution

  • Well, it seems that I've found the cause of the crash after the fix.

    So, basically, what happens, when you use the workaround described above (calling signal with dummy parameters multiple times), what it does is that it replaces _shared_state object that was created from boost code from main.dll by another _shared_state object that is created from boost code from tools.dll. This object maintains pointer to reference counter (of type derived from boost::detail::sp_counter_base) inside.

    Then the tools.dll unloads and the object remains, but its virtual table is pointing to the code that is no longer there. Let's look at the virtual table of the reference counter to understand what's going on.

        [0] 0x000007fed8a42fe5 tools.dll!boost::detail::sp_counted_impl_p<...>::`vector deleting destructor'(unsigned int)
        [1] 0x000007fed8a4181b tools.dll!boost::detail::sp_counted_impl_p<...>::dispose(void)
        [2] 0x000007fed8a4458e tools.dll!boost::detail::sp_counted_base::destroy(void)
        [3] 0x000007fed8a43c42 SegyTools.dll!boost::detail::sp_counted_impl_p<...>::get_deleter(class type_info const &)
        [4] 0x000007fed8a42da6 tools.dll!boost::detail::sp_counted_impl_p<...>::get_untyped_deleter(void)
    

    As you can see, all these method are connected to the disposal of reference counter, so the problem doesn't arise before you try to do the same trick second time. So, the trick with disconnecting all signals to try to get rid of all the code from tools.dll doesn't work as expected and the next time you try to do the trick, Access Violation occurs.