Search code examples
c++thread-safetyterminate

How to handle quick_exit in a class member thread when `this` becomes a nullptr?


I have this small snippet of code:

// (...)
class Time 
{
  std::atomic<bool> m_running;
  std::thread m_worker;
  // ...
};


Time::Time()
{
  // ...
  m_running = true;
  m_worker  = std::move(std::thread(std::bind(&Time::Worker, this)));
}


bool Time::HasTimedOut() const
{
  return (!m_disabled) &&
         (IsPending() && (GetRunTime() >= m_maximum_timeout) && (CloseHandlesDiff() >= m_minimum_close_time));
}

Time::~Time()
{
  if (m_running)
  {
    m_running = false;
    m_worker.join();
  }
}

void Time::Worker()
{
  while (m_running)
  {
    if (time_data->HasTimedOut())
    {
      time_data->RunTimedOutCallback();
    }

    if (m_count < 0)
    {
      m_count = 0;
    }

    if (m_running)
    {
      std::this_thread::sleep_for(std::chrono::milliseconds(20));
    }
  }
}

std::shared_ptr<Time> time_data(std::make_shared<Time>());

To my surprise, I have gotten a coredump, and the backtrace command from gdb shows this:

(gdb) bt
#0  0x09438408 in monitor::Time::HasTimedOut (this=0x0)
    at monitor.cxx: // return (!m_disabled) &&
#1  0x09438a84 in monitor::Time::Worker (this=0xbd96dd8)
    monitor.cxx: // if(time_data->HasTimedOut()
#2  0x0943cf81 in std::__invoke_impl<void, void (monitor::Time::*&)(), monitor::Time*&> (
    __f=@0xbd96e54: (void (monitor::Time::*)(monitor::Time * const)) 0x94389f0 <monitor::Time::Worker()>, __t=@0xbd96e5c: 0xbd96dd8)

A nullptr seems to be the problem (SEGFAULT):

(this=0x0)

This means that my class got destroyed, without the destructor being called.

This might be possible when the OS/watchdog for my application does a force exit / quick terminate as far as I know/suspect.

Are there any ways to deal with this? Maybe some shared_ptr atomic wrapping where I could check if the shared_ptr is a nullptr, is there some atomic if-not-null-execute-this? Then again.. this happened literally mid-execution.

I know one can add quick-exit hooks, but the quick exit is sometimes used for good reason, it would be agains the design to slow down the quick-exit. What would be the best way to handle this becoming a nullptr?

Or should I just let the SEGFALT happen because the application is being quick-exited anyway?

Here is the full stack, but that probably won't add any more useful information:

(gdb) bt
(gdb) bt
#0  0x09438408 in monitor::Time::HasTimedOut (this=0x0)
    at /opt/procesleiding/vptlib/lib/oracle_monitor.cxx:111
#1  0x09438a84 in monitor::Time::Worker (this=0xbd96dd8)
    at /opt/procesleiding/vptlib/lib/oracle_monitor.cxx:142
#2  0x0943cf81 in std::__invoke_impl<void, void (monitor::Time::*&)(), monitor::Time*&> (
    __f=@0xbd96e54: (void (monitor::Time::*)(monitor::Time * const)) 0x94389f0 <monitor::Time::Worker()>, __t=@0xbd96e5c: 0xbd96dd8)
    at /usr/include/c++/7/bits/invoke.h:73
#3  0x0943c99f in std::__invoke<void (monitor::Time::*&)(), monitor::Time*&> (
    __fn=@0xbd96e54: (void (monitor::Time::*)(monitor::Time * const)) 0x94389f0 <monitor::Time::Worker()>, __args#0=@0xbd96e5c: 0xbd96dd8)
    at /usr/include/c++/7/bits/invoke.h:95
#4  0x0943c70c in std::_Bind<void (monitor::Time::*(monitor::Time*))()>::__call<void, , 0u>(std::tuple<>&&, std::_Index_tuple<0u>) (
    this=0xbd96e54, __args=...)
    at /usr/include/c++/7/functional:467
#5  0x0943c28c in std::_Bind<void (monitor::Time::*(monitor::Time*))()>::operator()<, void>() (this=0xbd96e54)
    at /usr/include/c++/7/functional:551
#6  0x0943bcaf in std::__invoke_impl<void, std::_Bind<void (monitor::Time::*(monitor::Time*))()>>(std::__invoke_other, std::_Bind<void (monitor::Time::*(monitor::Time*))()>&&) (__f=...) at /usr/include/c++/7/bits/invoke.h:60
#7  0x0943b022 in std::__invoke<std::_Bind<void (monitor::Time::*(monitor::Time*))()>>(std::_Bind<void (monitor::Time::*(monitor::Time*))()>&&) (__fn=...) at /usr/include/c++/7/bits/invoke.h:95
#8  0x0943e2a6 in std::thread::_Invoker<std::tuple<std::_Bind<void (monitor::Time::*(monitor::Time*))()> > >::_M_invoke<0u>(std::_Index_tuple<0u>) (this=0xbd96e54) at /usr/include/c++/7/thread:234
#9  0x0943e15c in std::thread::_Invoker<std::tuple<std::_Bind<void (monitor::Time::*(monitor::Time*))()> > >::operator()() (this=0xbd96e54) at /usr/include/c++/7/thread:243
#10 0x0943e067 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind<void (monitor::Time::*(monitor::Time*))()> > > >::_M_run() (this=0xbd96e50) at /usr/include/c++/7/thread:186

Solution

  • Your code has a race condition.

    Time::Time()
    {
      // ...
      m_running = true;
      m_worker  = std::move(std::thread(std::bind(&Time::Worker, this)));
    }
    

    Here tread starts before

    std::shared_ptr<Time> time_data(std::make_shared<Time>());
    

    is completed.

    Simply thread reaches monitor::Time::HasTimedOut before std::shared_ptr<Time> time_data(std::make_shared<Time>()); is completed.

    Spawn thread not in constructor, but in separate method which you will invoke after time_data is assigned.

    Anyway it would be better if your Timer do not use time_data global variable at all.