C++: Reordering atomic store (release) and load (acquire)

I have written the following code, which acts a bit like a synchronous queue for one writer and one reader. Never more than 1 reader and 1 writer.

The writer repeatedly calls maybePublish which is designed to be lock-free. The reader, instead, uses spinUntilFreshAndFetch. It notifies through the atomic variable that it wants the next, very fresh item. After storing, it spins on the atomic variable, waiting for the writer to set it back to 0, after which it can take the shared object and place it into its own copy.

class Shared {
public:
    void maybePublish(const Item &item) {
        if (mItemSync.load(std::memory_order_acquire) == 1) {
            mItem = item;
            mItemSync.store(0, std::memory_order_release);
        }
    }

    void spinUntilFreshAndFetch(Item *copy) {
        mItemSync.store(1, std::memory_order_release);  // A
        while (mItemSync.load(std::memory_order_acquire) != 0) {  // B
            std::this_thread::sleep_for(std::chrono::milliseconds(1));
        }
        *copy = mItem;
    }
private:
    Item mItem;
    std::atomic_int32_t mItemSync = 0;
};

My worry is about line A and B. I can't see anything in the standard that wouldn't allow these lines to be swapped. The standard guarantees a release won't float above an acquire, but not that an acquire can't float above a release.

Also, I worry that it might be otherwise optimized. For example, could the compiler assume that, at B, mItemSync cannot be anything else but 1 (from line A), and turn this into an infinite loop?

According to a tutorial I saw, A and B cannot be reordered if I use std::memory_order_seq_cst instead. Should I do this?

Thanks for any advice!

Solution

The program is fine as is.

Atomics means: the compiler cannot reorder them around on a single thread, guarantees volatility (hence no infinite loops) and atomicity (operations are inseperable).

Acquire and release semantics means: if an acquire operation observes a side effect from a release operation, whatever is before the release is completed.

If we were to denote release as } and acquire as {. Anything within brackets cannot move outwards as per their semantics. Your two threads would then look like

reader             } {  {  {{   {   { R
writer {    {   {{    {  W       }
                   ^  ^          ^  ^
                   1  2          3  4

The writer first repeatedly tries to publish and acquire, which will fail until the reader releases.
The writer will acquire sometime after the reader releases.
Meanwhile the reader repeatedly tries to acquire, which will also fail until the writer releases.
The reader acquires.

Notice how these 4 operations will necessarily have to happen in this order. Writing to mItem is guaranteed to be between 2 and 3 and reading would have to happen after 4. Combined with tiling these two threads still preserves this property implies the program is fine.