Search code examples
c++multithreadingc++11relaxed-atomics

C++ countdown in CyclicBarrier going wrong using atomic variables [solutions without locks please]


I am trying to implement a cyclic barrier in C++ from scratch. Aim is to implement as conformant to Java implementation as possible. The class reference is here. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/CyclicBarrier.html

Now in my testing the returnStatus should be for each thread which successfully trips the barrier , a value ranging from barrierLimit-1 to zero. I am trying to achieve this using atomic variables and memory fence. but my code is failing testing and in some cases two threads are same value of returnStatus.

Would some one please suggest if any technique can be helpful to resolve this. I want to solve this without using locks so that i can truly apply the lockless behaviour as much as possible.

The full code reference is at : https://github.com/anandkulkarnisg/CyclicBarrier/blob/master/CyclicBarrier.cpp

Sample test case result is below [ buggy case ]:

I am currently in thread id = 140578053969664.My barrier state count is = 4
I am currently in thread id = 140577877722880.My barrier state count is = 2
I am currently in thread id = 140577550407424.My barrier state count is = 1
I am currently in thread id = 140577936471808.My barrier state count is = 2
I am currently in thread id = 140577760225024.My barrier state count is = 0


The code snippet is below.

        // First check and ensure that the barrier is in good / broken state.
        if(!m_barrierState && !m_tripStatus)
        {
            // First check the status of the variable and immediately exit throwing exception if the count is zero.
            int returnResult;
            if(m_count == 0)
                throw std::string("The barrier has already tripped. Pleas reset the barrier before use again!!" + std::to_string(returnResult));

            // First ensure that the current wait gets the waiting result assigned immediately.

            std::atomic_thread_fence(std::memory_order_acquire);
            m_count.fetch_sub(1, std::memory_order_seq_cst);
            returnResult = m_count.load();
    std::atomic_thread_fence(std::memory_order_release);

Solution

  • std::atomic_thread_fence(std::memory_order_acquire);
    m_count.fetch_sub(1, std::memory_order_seq_cst);      // [1]
    returnResult = m_count.load();                        // [2]
    std::atomic_thread_fence(std::memory_order_release);
    

    [2] multiple threads are doing this step at the same time. std::atomic_thread_fence does not prevent other threads from running the same code at the same time. That's how 2 threads can end up with the same value.

    Instead, catch the return value of the fetch_sub on the line marked with [1]

    returnResult = m_count.fetch_sub(1, std::memory_order_seq_cst) - 1;

    btw, I'm pretty sure you don't need the fences here. (I can't really tell without seeing more of the function.) If you do, you might just switch returnResult to be an atomic instead.

    It looks like you're using fences as if they were transactional memory. They are not. The release essentially controls guarantees of ordering of stores when perceived by any CPU that uses an acquire. As long it doesn't break the ordering guarantees, the write is free to propagate before the release is actually processed. As a thought experiment, imagine that [1] is executed, then a context switch occurs, a million years passes, then [2] is executed. It's now clearly absurd to assume that m_count holds the same value that it did a million years ago. The release may flush the write buffer, but it's possible that the change was flushed already.

    Lastly, weird stuff can happen if you mix seq_cst with acquire / release semantics. Sorry that's vague, but I don't understand it well enough to try to explain it.