As per the title, plus what are the limitations and gotchas.
For example, on x86 processors, alignment for most data types is optional - an optimisation rather than a requirement. That means that a pointer may be stored at an unaligned address, which in turn means that pointer might be split over a cache page boundary.
Obviously this could be done if you work hard enough on any processor (picking out particular bytes etc), but not in a way where you'd still expect the write operation to be indivisible.
I seriously doubt that a multicore processor can ensure that other cores can guarantee a consistent all-before or all-after view of a written pointer in this unaligned-write-crossing-a-page-boundary situation.
Am I right? And are there any similar gotchas I haven't thought of?
The very notion of a single memory visible to all threads ceases to work with several cores having individual caches. StackOverflow questions on memory barriers may be of interest; say, this one.
I think an example to illustrate the problem with the "single memory" model is this one: Initially, x = y = 0.
Thread 1:
X = x;
y = 1;
Thread 2:
Y = y;
x = 1;
Of course, there is a race condition. The secondary problem besides the obvious race condition is that one possible outcome is X=1, Y=1. Even without compiler optimizations (even if you write the above two threads in assembly).