Search code examples
c++exceptionglibcfreezelibstdc++

Programs hangs while throwing an exception


I'm having an issue running a C++ program (a web server) on a shared hosting machine.

The program runs fine on my development machine, but when I try to run it on the hosting machine, it hangs while trying to throw an exception.

That it's trying to throw an exception isn't a problem; if it succeeded in throwing the exception, the exception would be caught a few stack frames up, and the web server would continue to run.

Here's the stack trace of the hanging thread:

#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:136
#1  0x00007f18e559669a in _L_lock_1088 () from /home/nr/lib/glibc-2.14.1/lib/libpthread.so.0
#2  0x00007f18e55964fa in __pthread_mutex_lock (mutex=0x7f18e66b6930) at pthread_mutex_lock.c:82
#3  0x00007f18e530f3db in __dl_iterate_phdr (callback=0x970100 <_Unwind_IteratePhdrCallback>, data=0x7f18e2fe9040) at dl-iteratephdr.c:42
#4  0x00000000009714e3 in _Unwind_Find_FDE ()
#5  0x000000000096daf6 in uw_frame_state_for ()
#6  0x000000000096ed40 in uw_init_context_1 ()
#7  0x000000000096f53e in _Unwind_RaiseException ()
#8  0x00000000008dfe7b in __cxa_throw () at ../../../../gcc-5.1/libstdc++-v3/libsupc++/eh_throw.cc:82
#9  0x000000000054ff6e in Wt::WEnvironment::getCookie(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () at /home/nr/dev/libraries/wt-3.3.4/src/Wt/WEnvironment.C:435
#10 0x000000000069a372 in Wt::WebSession::handleRequest(Wt::WebSession::Handler&) () at /home/nr/dev/libraries/wt-3.3.4/src/web/WebSession.C:1388
#11 0x000000000068a21c in Wt::WebController::handleRequest(Wt::WebRequest*) () at /home/nr/dev/libraries/wt-3.3.4/src/web/WebController.C:713
#12 0x00000000004d815b in boost::asio::detail::completion_handler<boost::_bi::bind_t<void, boost::_mfi::mf1<void, Wt::WebController, Wt::WebRequest*>, boost::_bi::list2<boost::_bi::value<Wt::WebController*>, boost::_bi::value<http::server::HTTPRequest*> > > >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) () at /home/nr/dev/dist/boost/include/boost/bind/mem_fn_template.hpp:165
#13 0x000000000056e4a2 in Wt::WIOService::run() () at /home/nr/dev/dist/boost/include/boost/asio/detail/task_io_service_operation.hpp:38
#14 0x0000000000810ff3 in thread_proxy ()
#15 0x00007f18e5593cea in start_thread (arg=0x7f18e2fec700) at pthread_create.c:301
#16 0x00007f18e52d8fcd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Since it works fine on the development machine, I suspect the problem may be related to different versions of shared libraries being present on the development machine than the hosting machine, but I don't know what specifically. I do link everything I can statically, including libstdc++, precisely to avoid problems like this.

Any suggestions of how to diagnose this problem further are appreciated.

EDIT: If it helps, the development machine runs Debian Jessie, while the hosting machine runs CentOS 6.8.


Solution

  • I figured out the problem. It was indeed related to different versions of shared libraries being present on the development vs. hosting machine.

    I was already linking all C++ libraries statically, and only C libraries remained dynamically linked. Notably, glibc remained dynamically linked, because it doesn't support static linking well.

    The glibc version installed on the development machine was 2.19; on the hosting machine, it was 2.12.

    When I initially tried to run the program on the hosting machine, I got an error of the form:

    ./myapp: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by ./myapp)
    

    (The reason it was asking for 2.14 rather than 2.19 was that the functionality my program was actually using, was present in 2.14 and later, and glibc versions are backwards-compatible.)

    In an attempt to fix this problem, I built glibc 2.14, uploaded its binaries to the hosting machine, and pointed my program to them using LD_LIBRARY_PATH. That made the above error go away, but I now got the hang that prompted me to post this question.

    The reason for the hang, it turns out, is that there is one glibc component whose path is baked into the executable at compile time, and doesn't get overridden by LD_LIBRARY_PATH - the loader (ld-linux.so).

    So, I was using the hosting machine's glibc 2.12 loader, together with the remaining libraries from glibc 2.14 - and that doesn't work.

    I solved this by changing the linker command that produced the program on the development machine, to hardcode to the path to glibc 2.14 loader on the hosting machine, as described in this answer (a big thanks to @EmployedRussian for writing that!).