Search code examples
c++multithreadingconcurrency

Understanding Atomicity and Memory Visibility in Multithreaded counter++ Operations


I've encountered a common example in concurrency discussions involving a simple counter incremented in a multithreaded environment:

#include <thread>
#include <iostream>
#include <chrono>
#ifdef ATOMIC
#include <atomic>

std::atomic<int>
#else
int 
#endif    
counter(0);

void thread() {
    for(;;) {
#ifdef ATOMIC
        counter.fetch_add(1, std::memory_order_relaxed);
#else 
        counter++;
#endif
        std::this_thread::sleep_for(std::chrono::milliseconds(100));
    }
}

int main() {
    std::jthread t1{thread};
    std::jthread t2{thread};
    while (true) {
        std::cout << counter
#ifdef ATOMIC
        .load(std::memory_order_relaxed) 
#endif 
        << std::endl;
        std::this_thread::sleep_for(std::chrono::milliseconds(1000));
    }
}

In this code, the increment operation counter++ might not work as expected in a multithreaded context due to issues with atomicity. The general understanding is that incrementing a counter involves multiple steps:

  1. Fetch the current value of the counter.
  2. Increment this value by 1.
  3. Store the incremented value back.

However, when inspecting the compiled assembly code (e.g., on Godbolt Compiler Explorer), I've noticed that counter++ is translated to a single instruction: add DWORD PTR counter[rip], 1. This observation leads me to believe that the operation is atomic at the assembly level.

My question revolves around the atomicity and memory visibility aspects of this operation in a multithreaded context:

Despite being a single assembly instruction, why is counter++ not considered atomic in a multithreaded environment?

How do memory visibility and caching affect this operation? Specifically, if the counter is not immediately available in the cache, does this lead to significant wait states or can the add instruction be interrupted and retried?

I'm trying to deepen my understanding of how such seemingly simple operations behave differently when introduced to the complexities of multithreading and memory architectures.


Solution

  • Despite being a single assembly instruction, why is counter++ not considered atomic in a multithreaded environment?

    You mean, besides the fact that the compiler is not required to compile it into "a single assembly instruction"? You cannot look at the compiled code to know what that code means with regard to the meaning of the C++ code that generated it.

    The job of a compiler is to look at C++ code, understand what it means, and translate it into something that a particular machine will execute to create that meaning. Even if one compiler happens to give counter++ the equivalent of atomicity (and no, that doesn't even apply to this case), that means nothing about what the actual C++ code means.

    A compiler is just a translator, and that translation is lossy. Critical information about the meaning of the code is lost when it is converted to assembly. So you cannot look at the assembly and know what the original C++ code meant to do.

    If you want to understand availability and visibility, you're not going to understand it by looking at a single compiler's output. That will only tell you how that one compiler implements the rules of availability and visibility for a particular platform. If you want to understand the rules, you need to understand how they work in the higher level language that defines those rules.