multithreading delphi multiprocessing critical-section

Avoiding Cache Consistency Issues in Delphi With a Critical Section?

I just read a MSDN article, "Synchronization and Multiprocessor Issues", that addresses memory cache consistency issues on multiprocessor machines. This was really eye opening to me, because I would not have thought there could be a race condition in the example they provide. This article explains that writes to memory might not actually occur (from the perspective of the other cpu) in the order written in my code. This is a new concept to me!

This article provides 2 solutions:

Using the "volatile" keyword on variables that need cache consistency across multiple cpus. This is a C/C++ keyword, and not available to me in Delphi.
Using InterlockExchange() and InterlockCompareExchange(). This is something I could do in Delphi if I had to. It just seems a little messy.

The article also mentions that "The following synchronization functions use the appropriate barriers to ensure memory ordering: •Functions that enter or leave critical sections".

This is the part I don't understand. Does this mean that any writes to memory that are limited to functions that use critical sections are immune from cache consistency and memory ordering issues? I have nothing against the Interlock*() functions, but another tool in my tool belt would be good to have!

Solution

First of all, according to the language standards, volatile doesn't do what the article says it does. The acquire and release semantics of volatile are MSVC specific. This can be a problem if you compile with other compilers or on other platforms. C++11 introduces language supported atomic variables which will hopefully, in due course, finally put an end to the (mis-)use of volatile as a threading construct.

Critical sections and mutexes are indeed implemented so that reads and writes of protected variables will be seen correctly from all threads.

I think the best way to think of critical sections and mutexes (locks) is as devices to bring about serialization. That is, blocks of code protected by such locks are executed serially, one after another without overlap. The serialization applies to memory access also. There can be no problems due to cache coherence or read/write reordering.

Interlocked functions are implemented using hardware based locks on the memory bus. These functions are used by lock free algorithms. What this means is that they don't use heavy weight locks like critical sections, but rather these light weight hardware locks.

Lock free algorithms can be more efficient than those based on locks, but lock free algorithms can be very much harder to write correctly. Prefer critical sections over lock free unless the performance implications are discernable.

Another article well worth reading is The "Double-Checked Locking is Broken" Declaration.