C++: __sync_synchronize() still needed with std::atomic?

I've been running into an infrequent but re-occurring race condition. The program has two threads and uses std::atomic. I'll simplify the critical parts of the code to look like:

std::atomic<uint64_t> b;  // flag, initialized to 0
uint64_t data[100];  // shared data, initialized to 0

thread 1 (publishing):

// set various shared variables here, for example
data[5] = 10;

uint64_t a = b.exchange(1);  // signal to thread 2 that data is ready

thread 2 (receiving):

if (b.load() != 0) {  // signal that data is ready
  // read various shared variables here, for example:
  uint64_t x = data[5];
  // race condition sometimes (x sometimes not consistent)
}

The odd thing is that when I add __sync_synchronize() to each thread, then the race condition goes away. I've seen this happen on two different servers.

i.e. when I change the code to look like the following, then the problem goes away:

thread 1 (publishing):

// set various shared variables here, for example
data[5] = 10;

__sync_synchronize();
uint64_t a = b.exchange(1);  // signal to thread 2 that data is ready

thread 2 (receiving):

if (b.load() != 0) {  // signal that data is ready
  __sync_synchronize();
  // read various shared variables here, for example:
  uint64_t x = data[5];
}

Why is __sync_synchronize() necessary? It seems redundant as I thought both exchange and load ensured the correct sequential ordering of logic.

Architecture is x86_64 processors, linux, g++ 4.6.2

Solution

Whilst it is impossible to say from your simplified code what actually goes on in your actual application, the fact that __sync_synchronize helps, and the fact that this function is a memory barrier tells me that you are writing things in the one thread that the other thread is reading, in a way that isn't atomic.

An example:

thread_1:

    object *p = new object;
    p->x = 1;
    b.exchange(p);   /* give pointer p to other thread */

thread_2:

    object *p = b.load();
    if (p->x == 1) do_stuff();
    else error("Huh?");

This may very well trigger the error-path in thread2, because the write to p->x has not actually been completed when thread 2 reads the new pointer value p.

Adding memory barrier, in this case, in the thread_1 code should fix this. Note that for THIS case, a memory barrier in thread_2 will not do anything - it may alter the timing and appear to fix the problem, but it won't be the right thing. You may need memory barriers on both sides still, if you are reading/writing memory that is shared between two threads.

I understand that this may not be precisely what your code is doing, but the concept is the same - __sync_synchronize ensures that memory reads and memory writes have completed for ALL of the instructions before that function call [which isn't a real function call, it will inline a single instruction that waits for any pending memory operations to comlete].

Noteworthy is that operations on std::atomic ONLY affect the actual data stored in the atomic object. Not reads/writes of other data.

Sometimes you also need a "compiler barrier" to avoid the compiler moving stuff from one side of an operation to another:

  std::atomic<bool> flag(false);
  value = 42;
  flag.store(true);

  ....

another thread:

  while(!flag.load());
  print(value);

Now, there is a chance that the compiler generates the first form as:

  flag.store(true);
  value = 42;

Now, that wouldn't be good, would it? std::atomic is guaranteed to be a "compiler barrier", but in other cases, the compiler may well shuffle stuff around in a similar way.