distributed-computing distributed-system

How does Distributed Shared Memory work in the presence of cache and registers?

I recently read a few papers describing DSM.

Some of them tried to provide a Sequential consistency. I understand how the algorithm works and how it utilizes the page fault handler to accomplish it. But one thing puzzles me is that even though the memory itself is Sequential consistency, the cache and register one each machine may break the sequential consistency model.

How does it work with cache/registers?

Even the DSM, supporting relaxed consistency ( like TreadMarks paper ), which does all synchronization in the Lock() / Unlock operations, it still has the same problem:

How does it handle cache and register to prevent private copy on each machine?

Solution

DSM is to be extended into a DCSM ^{( + best with smart beyond-NUMA controls )}

A Distributed Coherent Shared Memory ( DCSM ) architecture, as a form of such suitable memory architecture, where a physically separate areas of memory can be still addressed as one contiguous, logically shared address space.

For this concept to work as requested above, the word "Coherent" is the key ( and the NUMA / L1, L2, { L3 | local-memory } management controls all the filigree CPU intrinsics, from registers to CPU-local caches and memory ).

Industry available robust DCSM implementations must have solved the very consistency issue, in order to ever become feasible, and the user-view of such underlying DCSM system is thus just a view of a big / fat monolythic-(abstract)-host, where in spite of so many thousands of CPU-s and all the physically distributed computing resources ( spanning from CPU-s, DRAM-memory-blocks, all-kinds of IO-devices, incl. storage, all network-interfaces, etc. ), the whole DCSM-integrated infrastructure still remains a coherent highly performant "super"-host.

So no direct user-code interactions are expected or needed and one can spin-up any of the legacy-code, now having some 8000+ CPU-s + XYZ [TB] RAM as a coherent space for pure in-RAM computing ( where XYZ can recently scale to ranges above a few hundreds, if not thousands, limited more by one's budget than in principle ).

One can easily feel, what to expect from such computing device, having under the hood such a beast with such an immense computing resources, where the user-code need not and does not bother how / where the actual resource is physically harnessed, as the O/S-level abstraction keeps the user-code to assume, it is just there and coherently runs "across" such a distributed compute-resources DCSM infrastructure.

Isn't this great?

How does Distributed Shared Memory work in the presence of cache and registers?

DSM is to be extended into a DCSM ( + best with smart beyond-NUMA controls )

DSM is to be extended into a DCSM ^{( + best with smart beyond-NUMA controls )}