c#.net multithreading memory-model memory-barriers

When can I guarantee value changed on one thread is visible to other threads?

I understand that a thread can cache a value and ignore changes made on another thread, but I'm wondering about a variation of this. Is it possible for a thread to change a cached value, which then never leaves its cache and so is never visible to other threads?

For example, could this code print "Flag is true" because thread a never sees the change that thread b makes to flag? (I can't make it do so, but I can't prove that it, or some variation of it, wouldn't.)

var flag = true;
var a = new Thread(() => {
    Thread.Sleep(200);
    Console.WriteLine($"Flag is {flag}");
});
var b = new Thread(() => {
    flag = false;
    while (true) {
        // Do something to avoid a memory barrier
    }
});

a.Start();
b.Start();

a.Join();

I can imagine that on thread b flag could be cached in a CPU register where it is then set to false, and when b enters the while loop it never gets the chance to (or never cares to) write the value of flag back to memory, hence a always sees flag as true.

From the memory barrier generators listed in this answer this seems, to me, to be possible in theory. Am I correct? I haven't been able to demonstrate it in practice. Can anyone come up with a example that does?

Solution

Is it possible for a thread to change a cached value, which then never leaves its cache and so is never visible to other threads?

If we're talking literally about the hardware caches, then we need to talk about specific processor families. And if you're working (as seems likely) on x86 (and x64), you need to be aware that those processors actually have a far stronger memory model than is required for .NET. In x86 systems, the caches maintain coherency, and so no write can be ignored by other processors.

If we're talking about the optimization wherein a particular memory location has been read into a processor register and then a subsequent read from memory just reuses the register, then there isn't a similar analogue on the write side. You'll note that there's always at least one read from the actual memory location before we assume that nothing else is changing that memory location and so we can reuse the register.

On the write side, we've been told to push something to a particular memory location. We have to at least push to that location once, and it would likely be a deoptimization to always store the previously known value at that location (especially if our thread never reads from it) in a separate register just to be able to perform a comparison and elide the write operation.