Do I need a memory barrier for a change notification flag between threads?

I need a very fast (in the sense "low cost for reader", not "low latency") change notification mechanism between threads in order to update a read cache:

The situation

Thread W (Writer) updates a data structure (S) (in my case a setting in a map) only once in a while.

Thread R (Reader) maintains a cache of S and does read this very frequently. When Thread W updates S Thread R needs to be notified of the update in reasonable time (10-100ms).

Architecture is ARM, x86 and x86_64. I need to support C++03 with gcc 4.6 and higher.

Code

is something like this:

// variables shared between threads
bool updateAvailable;
SomeMutex dataMutex;
std::string myData;

// variables used only in Thread R
std::string myDataCache;

// Thread W
SomeMutex.Lock();
myData = "newData";
updateAvailable = true;
SomeMutex.Unlock();

// Thread R

if(updateAvailable)
{
    SomeMutex.Lock();
    myDataCache = myData;
    updateAvailable = false;
    SomeMutex.Unlock();
}

doSomethingWith(myDataCache);

My Question

In Thread R no locking or barriers occur in the "fast path" (no update available). Is this an error? What are the consequences of this design?

Do I need to qualify updateAvailable as volatile?

Will R get the update eventually?

My understanding so far

Is it safe regarding data consistency?

This looks a bit like "Double Checked Locking". According to http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html a memory barrier can be used to fix it in C++.

However the major difference here is that the shared resource is never touched/read in the Reader fast path. When updating the cache, the consistency is guaranteed by the mutex.

Will R get the update?

Here is where it gets tricky. As I understand it, the CPU running Thread R could cache updateAvailable indefinitely, effectively moving the Read way way before the actual if statement.

So the update could take until the next cache flush, for example when another thread or process is scheduled.

Solution

Do I need to qualify updateAvailable as volatile?

As volatile doesn't correlate with threading model in C++, you should use atomics for make your program strictly standard-confirmant:

On C++11 or newer preferable way is to use atomic<bool> with memory_order_relaxed store/load:

atomic<bool> updateAvailable;

//Writer
....
updateAvailable.store(true, std::memory_order_relaxed); //set (under mutex locked)

// Reader

if(updateAvailable.load(std::memory_order_relaxed)) // check
{
    ...
    updateAvailable.store(false, std::memory_order_relaxed); // clear (under mutex locked)
    ....
}

gcc since 4.7 supports similar functionality with in its atomic builtins.

As for gcc 4.6, it seems there is not strictly-confirmant way to evade fences when access updateAvailable variable. Actually, memory fence is usually much faster than 10-100ms order of time. So you can use its own atomic builtins:

int updateAvailable = 0;

//Writer
...
__sync_fetch_and_or(&updateAvailable, 1); // set to non-zero
....

//Reader
if(__sync_fetch_and_and(&updateAvailable, 1)) // check, but never change
{
    ...
    __sync_fetch_and_and(&updateAvailable, 0); // clear
    ...
}

Is it safe regarding data consistency?

Yes, it is safe. Your reason is absolutely correct here:

the shared resource is never touched/read in the Reader fast path.

This is NOT double-check locking!

It is explicitely stated in the question itself.

In case when updateAvailable is false, Reader thread uses variable myDataCache which is local to the thread (no other threads use it). With double-check locking scheme all threads use shared object directly.

Why memory fences/barriers are NOT NEEDED here

The only variable, accessed concurrently, is updateAvailable. myData variable is accessed with mutex protection, which provides all needed fences. myDataCache is local to the Reader thread.

When Reader thread sees updateAvailable variable to be false, it uses myDataCache variable, which is changed by the thread itself. Program order garantees correct visibility of changes in that case.

As for visibility garantees for variable updateAvailable, C++11 standard provide such garantees for atomic variable even without fences. 29.3 p13 says:

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

Jonathan Wakely has confirmed, that this paragraph is applied even to memory_order_relaxed accesses in chat.