Search code examples
cadence-workflow

How does cadence handle fault in various failure condition?


Cadence is a fault tolerant stateful code platform. How does cadence handle fault in various failure condition?


Solution

  • There are al kinds of failures in distributed systems and Cadence provides various options to them.

    Here is the list from myself. It may not be complete. But I will try add more if I can think of.

    activity

    workflow

    Cadence server cluster

    Both activity and workflow workers are stateless.

    Cadence server is a highly available and scalable service provides the durability.

    • The durability is from underlying design and persistence storage ( by either Cassandra, MySQL or Postgres)
    • In a single cluster setup, Cadence service is running with different independent shards. The whole cluster consists of different hosts. Any failed host can be replaced by another.
    • Cadence provides Cross data center replication to provide much higher availability https://cadenceworkflow.io/docs/concepts/cross-dc-replication/#global-domains-architecture