Search code examples
hashredisconsistent-hashing

Consistent hashing as a way to scale writes


I am trying to figure out if I am on the right track. I am building a (real-time) statistics/analytics service and I use redis to store some sets and hashes.

Now let's assume I have some success and I need to scale out. The hash ring technique looks nice, but I have an impression that it is only suited for caching scenarios.

What if a node goes down? In theory, its keys are now owned by other nodes. In practice, they won't have the data. It is lost, right? Same with adding / removing nodes.

Am I missing some fundamental thing? Can this be a poor man's cluster?


Solution

  • There are two reasons to use multiple nodes in a cluster:

    • Sharding to limit the amount of data stored on each node
    • Duplication to reduce read load and allow a node to be removed without data loss.

    The two are fundamentally different, but you can implement both - use consistent hashing to point to a set of nodes with a standard master/slave setup rather than a single node.

    If the cluster is your primary data store rather than a cache, you will need a different redistribution strategy that includes copying the data.

    My implementation is based on having the client choose one of 64k buckets for a hash and having a table that maps that bucket to a node. Initially, all map to node #1.

    When node #1 gets too large, its slave becomes master node #2 and the table is updated to map half of the node #1 keys to node #2. At this point all reads and writes will work with the new mapping and you just need to clean up the keys that are now on the wrong node. Depending on the performance requirements, you can check all keys at once or check a random selection of keys as the expiry system does.