Search code examples
distributed-cachingdistributed-systemconsistent-hashing

Some followup questions about consistent hashing


I have read through a few articles that explain the theory behind consistent hashing. But most of them doesn't give much details about how to handle add/remove a node. I understand if it is used in cache layer like memcached, we might not need to do anything but if it is used in distributed storage, it is very critical to move some data to correct node. What exactly happened when we need to add/remove a node?

A few other questions are:

  1. what’s the best way to cope with servers of different sizes
  2. how to add and remove more than one machine at a time
  3. how to cope with replication and fault-tolerance

Hope someone could point me to an article that explain these.


Solution

  • But most of them doesn't give much details about how to handle add/remove a node.

    Have you read Dynamo: Amazon’s Highly Available Key-value Store? This is covered in some detail in section 4.

    what’s the best way to cope with servers of different sizes

    There's nothing stopping you from putting different amounts of data on different servers in a Dynamo-like or Cassandra-like system. It would add significant amounts of complexity, especially in the failure recovery cases, but doesn't break the fundamentals of the protcol in any way.