C++: Release barriers required in constructor that creates a thread that accesses the constructed object

If I create a thread in a constructor and if that thread accesses the object do I need to introduce a release barrier before the thread accesses the object? Specifically, if I have the code below (wandbox link) do I need to lock the mutex in the constructor (the commented out line)? I need to make sure that the worker_thread_ sees the write to run_worker_thread_ so that is doesn't immediately exit. I realize using an atomic boolean is better here but I'm interested in understanding the memory ordering implications here. Based on my understanding I think I do need to lock the mutex in the constructor to ensure that the release operation that the unlocking of the mutex in the constructor provides synchronizes with the acquire operation provided by the locking of the mutex in the threadLoop() via the call to shouldRun().

class ThreadLooper {
 public:
   ThreadLooper(std::string thread_name)
       : thread_name_{std::move(thread_name)}, loop_counter_{0} {
        //std::lock_guard<std::mutex> lock(mutex_);
        run_worker_thread_ = true;
        worker_thread_ = std::thread([this]() { threadLoop(); });
        // mutex unlock provides release semantics
   }

   ~ThreadLooper() {
     {
        std::lock_guard<std::mutex> lock(mutex_);
        run_worker_thread_ = false;
     }
     if (worker_thread_.joinable()) {
       worker_thread_.join();
     }
     cout << thread_name_ << ": destroyed and counter is " << loop_counter_
          << std::endl;     
   }

 private:
  bool shouldRun() {
      std::lock_guard<std::mutex> lock(mutex_);
      return run_worker_thread_;
  }

  void threadLoop() {
    cout << thread_name_ << ": threadLoop() started running"
         << std::endl;
    while (shouldRun()) {
      using namespace std::literals::chrono_literals;
      std::this_thread::sleep_for(2s);
      ++loop_counter_;
      cout << thread_name_ << ": counter is " << loop_counter_ << std::endl;
    }
    cout << thread_name_
         << ": exiting threadLoop() because flag is false" << std::endl;
  }

  const std::string thread_name_;
  std::atomic_uint64_t loop_counter_;
  bool run_worker_thread_;
  std::mutex mutex_;
  std::thread worker_thread_;
};

This also got me to thinking about more generally if I were to initialize a bunch of regular int (not atomic) member variables in the constructor that were then read from other threads via some public methods if I would need to similarly lock the mutex in the constructor in addition to in the methods that read these variables. This seems slightly different to me than the case above since I know that the object would be fully constructed before any other thread could access it, but that doesn't seem to ensure that the initialization of the object would be visible to the other threads without a release operation in the constructor.

Solution

You do not need any barriers because it is guaranteed that the thread constructor synchronizes with the invocation of the function passed to it. In Standardese:

The completion of the invocation of the constructor synchronizes with the beginning of the invocation of the copy of f.

Somewhat formal proof: run_worker_thread_ = true;(A) is sequenced before the thread object creation (B) according to the full expressions evaluation order. The thread object construction synchronizes with the closure object execution (C) according to the rule cited above. Hence, A inter-thread happens before C.

A seq before B, B sync with C, A happens before C -> this is a formal proof in Standard terms.

And when analyzing programs in C++11+ era you should stick to the C++ model of memory & execution and forget about barriers and reordering which compiler might or might not do. These are just implementation details. The only thing that matters is the formal proof in the C++ terms. Compiler must obey and do (and not do) whatever it can to adhere to the rules.

But for the sake of completeness let's look at the code with compiler's eyes and try to understand why it can't reorder anything in this case. We all know the "as-if" rule under which the compiler might reorder some instructions if you can't tell they have been reordered. So if we have some bool flags setting:

flag1 = true; // A
flag2 = false;// B

It is allowed to execute these lines as follows:

flag2 = false;// B
flag1 = true;// A

Despite the fact that A sequenced before B. It can do it because we can't tell the difference, we can't catch it reordering our instructions just by observing the program behavior because except "sequenced before" there is no relations between these lines. But let's get back to our case:

run_worker_thread_ = true; // A
worker_thread_ = std::thread(...); // B

It might look like that this case is the same as with bool variables above. And that would be the case if we didn't know that the thread object (besides being sequenced after the A expression) synchronizes with something (for simplicity let's ignore this something). But as we found out if something is sequenced before another thing which in its turn sync with yet another thing then it is happens before that thing. So the Standard requires for the A expression to happen before that something our B expression sync with.

And this fact forbids compiler to reorder our A & B expressions because suddenly we can tell the difference if it did so. Because if it did it then the C expression (something) might not see the visible side effects provided by A. So just by observing the program execution we might caught the cheating compiler! Hence, it has to use some barriers. It doesn't matter if it is just a compiler barrier or a hardware one—it has to be there to guarantee that these instructions are not reordered. So you might think that it uses a release fence upon the construction completion and an acquire fence upon the closure object execution. That would roughly describe what happens under the hood.

It also looks like you treat mutex as some kind of magic thing which always work and do not require any proofs. So for some reason you believe in mutex and not in thread. But the thing is that it has no magic and the only guarantee it has is that lock sync with prior unlock and vice versa. So it provides the same guarantee that thread provides.