Search code examples
c++multithreadingc++11atomicstdatomic

Do I need std::atomic<bool> or is POD bool good enough?


Consider this code:

// global
std::atomic<bool> run = true;

// thread 1
while (run) { /* do stuff */ }

// thread 2
/* do stuff until it's time to shut down */
run = false;

Do I need the overhead associated with the atomic variable here? My intuition is that the read/write of a boolean variable is more or less atomic anyway (this is a common g++/Linux/Intel setup) and if there is some write/read timing weirdness, and my run loop on thread 1 stops one pass early or late as a result, I'm not super worried about it for this application.

Or is there some other consideration I am missing here? Looking at perf, it appears my code is spending a fair amount of time in std::atomic_bool::operator bool and I'd rather have it in the loop instead.


Solution

  • You need to use std::atomic to avoid undesired optimizations (compiler reading the value once and either always looping or never looping) and to get correct behavior on systems without a strongly ordered memory model (x86 is strongly ordered, so once the write finishes, the next read will see it; on other systems, if the threads don't flush CPU cache to main RAM for other reasons, the write might not be seen for a long time, if ever).

    You can improve the performance though. Default use of std::atomic uses a sequential consistency model that's overkill for a single flag value. You can speed it up by using load/store with an explicit (and less strict) memory ordering, so each load isn't required to use the most paranoid mode to maintaining consistency.

    For example, you could do:

    // global
    std::atomic<bool> run = true;
    
    // thread 1
    while (run.load(std::memory_order_acquire)) { /* do stuff */ }
    
    // thread 2
    /* do stuff until it's time to shut down */
    run.store(false, std::memory_order_release);
    

    On an x86 machine, any ordering less strict than the (default, most strict) sequential consistency ordering typically ends up doing nothing but ensuring instructions are executed in a specific order; no bus locking or the like is required, because of the strongly ordered memory model. Thus, aside from guaranteeing the value is actually read from memory, not cached to a register and reused, using atomics this way on x86 is free, and on non-x86 machines, it makes your code correct (which it otherwise wouldn't be).