Search code examples
c++multithreadingc++11atomiclock-free

Lock free single producer/single consumer circular buffer


I have been looking at a lock free single producer/single consumer circular buffer on this website when I couldn't figure out why a specific memory barrier was needed. I have carefully read a hundread time the standard rules about memory order but I don't understand what I'm missing.

With this implementation, there is only a unique thread which can call the push() function and another unique thread which can call the pop() function.

Here is the Producer code:

bool push(const Element& item)
{       
  const auto current_tail = _tail.load(std::memory_order_relaxed);  //(1)
  const auto next_tail = increment(current_tail);

  if(next_tail != _head.load(std::memory_order_acquire))            //(2)               
  {     
    _array[current_tail] = item;                                    //(3)
    _tail.store(next_tail, std::memory_order_release);              //(4)
    return true;
  }
  return false; // full queue
}

Here is the Consumer code:

bool pop(Element& item)
{
  const auto current_head = _head.load(std::memory_order_relaxed);    //(1)
  if(current_head == _tail.load(std::memory_order_acquire))           //(2)
    return false; // empty queue

  item = _array[current_head];                                       //(3)
  _head.store(increment(current_head), std::memory_order_release);   //(4)
  return true;
}

I understand why the Producer (4) and the Consumer (2) statements are absolutly needed, this is because we have to make sure that all writes that happened before the (4) released store by the Producerwill be visible side effects once the consumer will see the stored value.

I also understand why the Consumer (4) statement is needed, this is to make sure that the Consumer (3) load will be performed before the Consumer (4) store will be performed.

The question

  • Why is the Producer (2) load needs to be performed with acquire semantic (instead of relaxed)? Is it to prevent Producer (3) or (4) to be reodered before the condition (at compile time or at runtime)?

Solution

  • we need prove that

    _array[current_tail] = item; // push(3)
    

    excecuted after conforming (current_head == current_tail)

    item = _array[current_head]; // pop(3)
    

    is completed. we can overwrite cell, only after data from it already copied to item

    the

    _head.load(std::memory_order_acquire) // push(2)
    

    synchronized with

    _head.store(increment(current_head), std::memory_order_release);   //pop(4)
    

    via Release-Acquire ordering:

    all memory writes ( pop(3) ) that happened-before the atomic store-release ( pop(4) ) on _head become visible side-effects once the atomic load-acquire ( push(2) ) is completed on _head.

    so Producer code after push(2) is completed, guaranteed to see result of pop(3). this mean that data from _array[current_head] is copied to item and result of this operation is visible for Producer code after push(2), so _array[current_head] already free.

    from another side from memory_order_acquire load description - no reads or writes ( push(3) ) in the current thread can be reordered before this load. so push(3) will be execute already after push(2) load complete, but at this point pop(3) already completed

    item = _array[current_head];                                        //pop(3)
    _head.store(increment(current_head), std::memory_order_release);    //pop(4)
    -----
        _head.load(std::memory_order_acquire);                          //push(2)
        _array[current_tail] = item;                                    //push(3)