Does std::mutex enforce cache coherence?

I have a non-atomic variable my_var and an std::mutex my_mut. I assume up to this point in the code, the programmer has followed this rule:

Each time the programmer modifies or writes to my_var, he locks and unlocks my_mut.

Assuming this, Thread1 performs the following:

my_mut.lock();
my_var.modify();
my_mut.unlock();

Here is the sequence of events I imagine in my mind:

Prior to my_mut.lock();, there were possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule.
By the instruction my_mut.lock();, all writes from the previously executed my_mut critical section are visible in memory to this thread.
my_var.modify(); executes.
After my_mut.unlock();, there are possibly multiple copies of my_var in main memory and some local caches. These values do not necessarily agree, even if the programmer followed the rule. The value of my_var at the end of this thread will be visible to the next thread that locks my_mut, by the time it locks my_mut.

I have been having trouble finding a source that verifies that this is exactly how std::mutex should work. I consulted the C++ standard. From ISO 2013, I found this section:

[ Note: For example, a call that acquires a mutex will perform an acquire operation on the locations comprising the mutex. Correspondingly, a call that releases the same mutex will perform a release operation on those same locations. Informally, performing a release operation on A forces prior side effects on other memory locations to become visible to other threads that later perform a consume or an acquire operation on A.

Is my understanding of std::mutex correct?

Solution

C++ operates on the relations between operations not some particular hardware terms (like cache cohesion). So C++ Standard has a happens-before relationship which roughly means that whatever happened before completed all its side-effects and therefore is visible at the moment that happened after.

And given you have an exclusive critical session to which you have entered means that whatever happens within it, happens before the next time this critical section is entered. So any consequential entering to it will see everything happened before. That's what the Standard mandates. Everything else (including the cache cohesion) is the implementation's duty: it has to make sure that the described behavior is coherent with what actually happens.