Search code examples
kademlia

What is the meaning of bucket height in the Kademlia paper?


It said:

We start with some definitions. For a k-bucket covering the distance range 2i,2i+1 , define the index of the bucket to be i. Define the depth, h, of a node to be 160 − i, where i is the smallest index of a non-empty bucket. Define node y’s bucket height in node x to be the index of the bucket into which x would insert y minus the index of x’s least significant empty bucket. Because node IDs are randomly chosen, it follows that highly non-uniform distributions are unlikely. Thus with overwhelming probability the height of a any given node will be within a constant of log n for a system with n nodes. Moreover, the bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k.

I can understand the definition of bucket height, but I don't know why we need that definition, and I don't understand the last sentence of the paragraph.


Updates: I also think that the paper has a typo: the bucket height should be the index of the bucket containing y minus the index of x’s least significant "NON-"empty bucket. Am I wrong?


Solution

  • but I don't know why we need that definition

    The argument for O(log n) efficiency of kademlia in terms of routing table size and lookup steps is based on mapping the entire keyspace of n nodes into k-buckets where further-away buckets cover exponentially larger fractions of the keyspace. Effectively compressing the whole network into a biased list of samples.

    Then arguments further down are then based on this bucket-based projection.

    Moreover, the bucket height of the closest node to an ID in the kth-closest node will likely be within a constant of log k.

    I think this is a convoluted way of saying that your k nearest neighbors will all end up in or near the same bucket, i.e. the deepest one (smallest index non-empty bucket).

    Note that this is expressed in terms of the flat layout, in the tree layout the smallest bucket would be akin to but not necessarily identical with the the own-ID-covering bucket.