Search code examples
gccboostsolaris

Deadlock during static initialization


I'm running into a deadlock during static initialization in Solaris. The situation strongly resembles that of this user's problem.

My environment is:

  • solaris 10
  • gcc 5.4 installed to a non-standard location
  • all relevant shared libraries are linked against the libstdc++ and/or libgcc_s libraries from that installation
  • boost 1.45 (we're moving away from it soon, but for the moment that cannot change)
  • I see this problem when linking dynamically or statically against boost libraries

The symptoms:

  • Deadlocks while executing boost::system::generic_category()
  • generic_category() is being called to initialize global static references in boost/system/error_code.hpp
  • If I shuffle link order, putting -lboost_system ahead of other libraries being linked in, the problem goes away.
  • If I set a breakpoint in generic_category() then attempt to step over the 1st line after the first time the breakpoint gets hit, the breakpoint gets hit again when executing the same function in a different shared library's _init() -- that is, it never stops on the 2nd line of generic_category() from when I told it to step over the 1st line.

Since stepping over the 1st line didn't work, I stepped into it then stepped out & again the breakpoint got hit.

I restarted the process & stepped in after the breakpoint got hit then began stepping. Stepping over the call to boost::system::error_category::error_category() I ran into the same problem.

I tried again, this time stepping an instruction at a time when I got to the error_category() call. It attempts to call it through the PLT which calls elf_rtbndr() which is supposed to return the real function's address in %o0, but when I step over the call to elf_rtbndr() it again hits the breakpoint instead of resuming where it left off.

When the breakpoint gets hit for the 2nd time it's calling generic_category() in some other shared library's _init(); that's when the deadlock occurs.

Thanks in advance for your time & help.


Solution

  • This has been reported several times (see this post in Boost and another in GCC). This seems to be a circular dependency issue during Boost initialization which, for some reason, only manifests on Solaris. The usual advice is to work around this by messing with library initialization (e.g. by shuffling the library order as you did with -lboost_system).

    Another option is to disable thread-safe guards (-fno-threadsafe-statics flag) which would get rid of the deadlock but would keep the buggy nested constructor call which is undesirable.