Search code examples
scalabilitydistributedconsistent-hashingrendezvous-hash

Can Rendezvous hashing add a node efficiently?


The wikipedia article for Rendezvous hashing (https://en.wikipedia.org/wiki/Rendezvous_hashing) doesn't explain what happens when you add a node to the hash table. The way I understand it, if you add a node to a hash table implemented via Rendezvous hashing, there may be objects in other nodes that should actually map to this new node since it's hash values for those objects are higher than the ones for the nodes those objects are currently in. In order to fix this problem, you would need to scan the entire hash table, recompute the hash values, and move objects if needed. This is extremely costly performance wise.

The only way I see rendezvous hashing making any sense is if the hashtable acts as a cache and is backed by a database. Then if a node doesn't have an object, it can be fetched from the database. Also, if a node has an object but the key for that object no longer maps to that node, the node's cache algorithm will evict it (through LRU/LFU).

Am I understanding this correctly? Is there a way to fix this problem?


Solution

  • Great question! The wikipedia article actually touches that topic "If an object already in the system at ... it will be fetched afresh and cached".

    Basically, the proposed area for this algorithm is cases where you can cache a value, but it is ok to recache it later. While it does require extra processing, the implementation itself is dead simple. A real world example for this approach would be memcached - it uses this exact approach and does not care if you add/remove nodes - no rehashing is happening for existing keys.

    Another interesting note is about relation of Rendezvous and Consistent Hashing - Consistent hashing aims to move only some keys, the ones which map into the new partition. In Rendezvous hashing case, the same number of keys will be moved; even better that every existing node on average will give away same percentage of keys - but this comes at a cost of all keys have to be reprocessed.