Search code examples
c++linuxatomic

C++: __sync_synchronize() still needed with std::atomic?


I've been running into an infrequent but re-occurring race condition. The program has two threads and uses std::atomic. I'll simplify the critical parts of the code to look like:

std::atomic<uint64_t> b;  // flag, initialized to 0
uint64_t data[100];  // shared data, initialized to 0

thread 1 (publishing):

// set various shared variables here, for example
data[5] = 10;

uint64_t a = b.exchange(1);  // signal to thread 2 that data is ready

thread 2 (receiving):

if (b.load() != 0) {  // signal that data is ready
  // read various shared variables here, for example:
  uint64_t x = data[5];
  // race condition sometimes (x sometimes not consistent)
}

The odd thing is that when I add __sync_synchronize() to each thread, then the race condition goes away. I've seen this happen on two different servers.

i.e. when I change the code to look like the following, then the problem goes away:

thread 1 (publishing):

// set various shared variables here, for example
data[5] = 10;

__sync_synchronize();
uint64_t a = b.exchange(1);  // signal to thread 2 that data is ready

thread 2 (receiving):

if (b.load() != 0) {  // signal that data is ready
  __sync_synchronize();
  // read various shared variables here, for example:
  uint64_t x = data[5];
}

Why is __sync_synchronize() necessary? It seems redundant as I thought both exchange and load ensured the correct sequential ordering of logic.

Architecture is x86_64 processors, linux, g++ 4.6.2


Solution

  • Whilst it is impossible to say from your simplified code what actually goes on in your actual application, the fact that __sync_synchronize helps, and the fact that this function is a memory barrier tells me that you are writing things in the one thread that the other thread is reading, in a way that isn't atomic.

    An example:

    thread_1:
    
        object *p = new object;
        p->x = 1;
        b.exchange(p);   /* give pointer p to other thread */
    
    thread_2:
    
        object *p = b.load();
        if (p->x == 1) do_stuff();
        else error("Huh?");
    

    This may very well trigger the error-path in thread2, because the write to p->x has not actually been completed when thread 2 reads the new pointer value p.

    Adding memory barrier, in this case, in the thread_1 code should fix this. Note that for THIS case, a memory barrier in thread_2 will not do anything - it may alter the timing and appear to fix the problem, but it won't be the right thing. You may need memory barriers on both sides still, if you are reading/writing memory that is shared between two threads.

    I understand that this may not be precisely what your code is doing, but the concept is the same - __sync_synchronize ensures that memory reads and memory writes have completed for ALL of the instructions before that function call [which isn't a real function call, it will inline a single instruction that waits for any pending memory operations to comlete].

    Noteworthy is that operations on std::atomic ONLY affect the actual data stored in the atomic object. Not reads/writes of other data.

    Sometimes you also need a "compiler barrier" to avoid the compiler moving stuff from one side of an operation to another:

      std::atomic<bool> flag(false);
      value = 42;
      flag.store(true);
    
      ....
    

    another thread:

      while(!flag.load());
      print(value); 
    

    Now, there is a chance that the compiler generates the first form as:

      flag.store(true);
      value = 42;
    

    Now, that wouldn't be good, would it? std::atomic is guaranteed to be a "compiler barrier", but in other cases, the compiler may well shuffle stuff around in a similar way.