c++multithreading synchronization stdatomic

Non-deterministic read values when using std::atomic store/load with std::memory_order_seq_cst

I started learning about memory orderings in C++ using std::atomic, I'm trying to understand the synchronization mechanism between a successive store and load of an atomic variable from two different threads. If we to call the load and store from two different threads using the default memory order std::memory_order_seq_cst, like this

std::atomic<int> data(0);

void func() {
  data.store(1234, std::memory_order_seq_cst);
}

int main() {
  std::thread t(func);
  int val = data.load(std::memory_order_seq_cst);
  std::cout << "value: " << val << std::endl;
  t.join();
  return 0;
}

I'm seeing non-deterministic output (most of the time 0, but sometimes 1234). I learned that the atomic load and store must happen based on program order, which means that they are synchronized, which from my pov is contradicting between what I learned and what I'm seeing. What's the gap in my understanding? Is it that while the store and load is ordered, the memory itself is not coherent between the two threads? (btw, I compiled the above program using g++ -std=c++17 -pthread main.cpp -o main).

I compiled and ran the program above.

Solution

TL:DR: as Jesper Juhl commented:
Sometimes the new thread runs first, sometimes it doesn't. Simple as that.

The whole point of threads is that they can run independently of each other. seq_cst means that the total order is some interleaving of program order of each thread, but there's no guarantee which interleaving you'll get.

The order where the store goes first and the load goes second, and vice versa, are both allowed. With only one atomic operation in each thread, and no other shared data, seq_cst isn't doing anything that relaxed wouldn't.

Your program doesn't do anything to guarantee / require that the load will run before vs. after the store, e.g. putting the load before the std::thread t(func); constructor or after the t.join(); would both create a happens-before relationship between load and store.

In your current program, it's just up to chance and the OS's scheduling decisions on thread creation whether data.load runs before or after data.store runs (and the data goes through the store buffer and commits to cache, becoming globally visible).

Is it that while the store and load is ordered, the memory itself is not coherent between the two threads?

No, C++ guarantees coherency - a later read is guaranteed to see a value from an earlier store, from the modification order of the object you're reading.

(It would be really hard to actually make a C++ implementation on hardware without coherent cache, since C++ requires that separate threads can modify adjacent char objects in an array without interfering with each other, among other things. If two threads had dirty copies of the same line, they'd need per-byte dirty bitmaps to merge on commit if they wanted to avoid stepping on the other thread's store during write-back. All real hardware has coherent cache between cores that std::thread can run threads across, typically with MESI so a core doing a store has to get exclusive ownership of the cache line first (invalidating all other copies), before modifying it.)

C++ atomics reading stale value - near duplicate. You only get synchronization between threads if the load does happen to see a value stored by another thread.
Sequentially consistent fence - C++ memory barriers (and operations with non-relaxed memory orders) aren't like pthread_barrier() synchronization primitives that wait for all threads to reach them (wikipedia).