Search code examples
couchdbdatabase-cluster

Cluster vs replication


I have an use case where I am looking to replicate a single database on multiple servers (for HA and scalability purposes),

Would there be any disadvantage to run a 3 node replica instead of a 3 nodes cluster ?


Solution

  • Couchdb docs 11.2 provides an example cluster configuration of:

    [cluster]
      q=8
      r=2
      w=2
      n=3
    

    q - The number of shards.

    r - The number of copies of a document with the same revision that have to be read before CouchDB returns with a 200 and the document. If there is only one copy of the document accessible, then that is returned with 200.

    w - The number of nodes that need to save a document before a write is returned with 201. If the nodes saving the document is 0, 202 is returned.

    n - The number of copies there is of every document. Replicas.

    The behavior of your 3 part replica should be equivalent to:

    [cluster]
      q=1
      r=1
      w=1
      n=3
    

    when replicating correctly. This is a possible configuration of clustering, but not an optimal as it lacks:

    • the benefit of confirmation that multiple nodes and a majority of nodes have confirmed a save before it is acknowledged.

    • the benefit of confirmation that multiple nodes and a majority of nodes have confirmed a revision is correct before it is returned.

    • Expandability of the database beyond a single node's storage via sharding.

    • The ability to change to any configuration equivalent to cluster parameters with q, r or w > 1 without switching to a cluster.

    Indirectly, the limits on acknowledgements make more potential conflicts to resolve between the replicas if the replicas are actually used for network scalability, and greater odds an actual inconsistency in the form of lost records if a node fails between acknowledging a save and passing it on to the other replicas.