Search code examples
c++windowsboost

Boost interprocess mutexes and checking for abandonment


I have a need for interprocess synchronization around a piece of hardware. Because this code will need to work on Windows and Linux, I'm wrapping with Boost Interprocess mutexes. Everything works well accept my method for checking abandonment of the mutex. There is the potential that this can happen and so I must prepare for it.

I've abandoned the mutex in my testing and, sure enough, when I use scoped_lock to lock the mutex, the process blocks indefinitely. I figured the way around this is by using the timeout mechanism on scoped_lock (since much time spent Googling for methods to account for this don't really show much, boost doesn't do much around this because of portability reasons).

Without further ado, here's what I have:

#include <boost/interprocess/sync/named_recursive_mutex.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>

typedef boost::interprocess::named_recursive_mutex MyMutex;
typedef boost::interprocess::scoped_lock<MyMutex> ScopedLock;

MyMutex* pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");

{
    // ScopedLock lock(*pGate); // this blocks indefinitely
    boost::posix_time::ptime timeout(boost::posix_time::microsec_clock::local_time() + boost::posix_time::seconds(10));
    ScopedLock lock(*pGate, timeout); // a 10 second timeout that returns immediately if the mutex is abandoned ?????
    if(!lock.owns()) {
        delete pGate;
        boost::interprocess::named_recursive_mutex::remove("MutexName");
        pGate = new MyMutex(boost::interprocess::open_or_create, "MutexName");
    }
}

That, at least, is the idea. Three interesting points:

  • When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected.
  • When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?
  • When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.

So, what am I missing on using these objects? Perhaps it's staring me in the face, but I can't see it and so I'm asking for help.

I should also mention that, because of how this hardware works, if the process cannot gain ownership of the mutex within 10 seconds, the mutex is abandoned. In fact, I could probably wait as little as 50 or 60 milliseconds, but 10 seconds is a nice "round" number of generosity.

I'm compiling on Windows 7 using Visual Studio 2010.

Thanks, Andy


Solution

  • When I don't use the timeout object, and the mutex is abandoned, the ScopedLock ctor blocks indefinitely. That's expected

    The best solution for your problem would be if boost had support for robust mutexes. However Boost currently does not support robust mutexes. There is only a plan to emulate robust mutexes, because only linux has native support on that. The emulation is still just planned by Ion Gaztanaga, the library author. Check this link about a possible hacking of rubust mutexes into the boost libs: http://boost.2283326.n4.nabble.com/boost-interprocess-gt-1-45-robust-mutexes-td3416151.html

    Meanwhile you might try to use atomic variables in a shared segment.

    Also take a look at this stackoverflow entry: How do I take ownership of an abandoned boost::interprocess::interprocess_mutex?

    When I do use the timeout, and the mutex is abandoned, the ScopedLock ctor returns immediately and tells me that it doesn't own the mutex. Ok, perhaps that's normal, but why isn't it waiting for the 10 seconds I'm telling it too?

    This is very strange, you should not get this behavior. However: The timed lock is possibly implemented in terms of the try lock. Check this documentation: http://www.boost.org/doc/libs/1_53_0/doc/html/boost/interprocess/scoped_lock.html#idp57421760-bb This means, the implementation of the timed lock might throw an exception internally and then returns false.

    inline bool windows_mutex::timed_lock(const boost::posix_time::ptime &abs_time)
    {
       sync_handles &handles =
          windows_intermodule_singleton<sync_handles>::get();
       //This can throw
       winapi_mutex_functions mut(handles.obtain_mutex(this->id_));
       return mut.timed_lock(abs_time);
    }
    

    Possibly, the handle cannot be obtained, because the mutex is abandoned.

    When the mutex isn't abandoned, and I use the timeout, the ScopedLock ctor still returns immediately, telling me that it couldn't lock, or take ownership, of the mutex and I go through the motions of removing the mutex and remaking it. This is not at all what I want.

    I am not sure about this one, but I think the named mutex is implemented by using a shared memory. If you are using Linux, check for the file /dev/shm/MutexName. In Linux, a file descriptor remains valid until that is not closed, no matter if you have removed the file itself by e.g. boost::interprocess::named_recursive_mutex::remove.