Search code examples
distributeddistributed-computingsnapshot

How are Distributed Snapshot algorithms (likes of Chandy Lamport) implemented in real world Distributed systems?


Can anyone explain, how Distributed Snapshot algorithms ( Example: Chandy-Lamport are implemented in the context of modern distributed systems?

Can you name an open source System implementation which uses this / these class of algorithm?

How does this theory really translate to real world?


Solution

  • It can be useful for rollback recovery systems on network-on-chip (NOC) systems.It is also used for determining the global state of the system during computation.

    As an example HP uses this kind of algorithms for rollback-recovery protocol for crash/recover hosts and fair-loss links. you can find an interesting article about that here:

    http://www.hpl.hp.com/techreports/2010/HPL-2010-155.pdf