Search code examples
amazon-dynamodbdistributed-system

Virtual Nodes in Dynamo


Recently I have read the paper of Dynamo, the key/value storage system of Amazon. The Dynamo uses consistent hashing algorithm as the partition algorithm. To solve the challenge of load balance and heterogeneous, it applies the "virtual node" mechanism. Here is my question:

  1. It is described that "The number of virtual nodes that a node is responsible can decided based on its capacity", but what capacity it is? Is it the calculation capacity, network bandwidth, or the disk volume?
  2. What is the technology to partition a node to "virtual nodes"? Is a virtual node just a process? Or maybe using docker or virtual machine?

Solution

  • Without going into specifics, for #1 the answer would be: all of the above. The capacity may be determined empirically for different node types after running some load testing and noting the results. A similar process to what you would use to determine the capacity of a web server.

    And for your second question, the paper just says that you should think of nodes from a logical stand point. In order to satisfy #1, nodes in the ring are designated such that one or multiple nodes would hash to the same physical hardware. So a virtual node is just a logical mapping. It is just one more layer of abstraction on top of the physical layer. If you are familiar with file systems, think of a virtual node like an iNode vs. a disk cylinder (a comparison perhaps slightly dated)