I, already, reviewed a number of documents on this topic but there is something not exactly clear. For example bit torrent document (http://www.bittorrent.org/beps/bep_0005.html) states
The routing table is subdivided into "buckets" that each cover a portion of the space. An empty table has one bucket with an ID space range of min=0, max=2^160. When a node with ID "N" is inserted into the table, it is placed within the bucket that has min <= N < max. An empty table has only one bucket so any node must fit within it. Each bucket can only hold K nodes, currently eight, before becoming "full." When a bucket is full of known good nodes, no more nodes may be added unless our own node ID falls within the range of the bucket. In that case, the bucket is replaced by two new buckets each with half the range of the old bucket and the nodes from the old bucket are distributed among the two new ones. For a new table with only one bucket, the full bucket is always split into two new buckets covering the ranges 0..2^159 and 2^159..2^160.
It is somewhat different from other documents regarding kademlia routing table where buckets are arranged in accordance to the bit prefix of the node id but there is another confusing thing. When we reply to the "find node" request we have to find 8 closest nodes to the requested one using XOR operation. I saw some implementations just going through each item in the routing table executing XOR operation and thus finding 8 closest items. It seems to me too CPU wasting.
Everything is already in buckets. Even if we use the suggested by the bit torrent document system we can quicker find the bucket which could potentially contain the requested node id simply enumerating buckets and checking the minimum and maximum numbers on it. Then potentially that bucket should contain the closes nodes but they are value closest nodes not the XOR closest nodes (as I understand) which is somewhat different but somewhat similar.
I ran a simple test using numbers from 0 to 99 where I wanted to find 8 XOR closest numbers and they where in the proximity of the sought number but not right near it. Now, thinking about our buckets I guess it possible that all node ids in the bucket are the closest for a minor exception. So, for example if we take this bucket, take one from the left and one from the right and search for the XOR closest node ids we'll find what we are looking for and there is no point to go through ALL nodes in the routing table.
Am I right or am I missing something?
It is somewhat different from other documents regarding kademlia routing table where buckets are arranged in accordance to the bit prefix of the node id but there is another confusing thing.
The bittorrent specification describes a routing table implementation that only approximates the one described in the kademlia paper. It is easier to implement but has some shortcomings.
So, for example if we take this bucket, take one from the left and one from the right and search for the XOR closest node ids we'll find what we are looking for and there is no point to go through ALL nodes in the routing table.
In both - the full, tree-like routing table implementation and the simplified BEP5-variant - each bucket can be thought of as having a CIDR-like prefix (see this SO answer) consisting of prefix bits that the bucket covers and a mask bit count.
In the BEP5-variant each bucket's prefix is simply derived from the array index and your node's own ID. In the tree-like table buckets have to track their prefix due to bucket split/merge operations.
Using those prefixes you can find the bucket that covers the target key.
The thing is that buckets don't have to be full, and if you want to send, let's say 20 nodes in a response a single bucket will not suffice.
So you have to traverse the routing table (either sorted based on your own node ID or by the natural distance) in ascending distance (XOR) order relative to the target key to visit multiple buckets.
Since the XOR distance metric folds at each bit-carry (XOR == carry-less addition) it does not map nicely to any routing table layout. In other words, visiting the nearest buckets won't do.
Here's my implementation for finding the N closest nodes to a specific target key from a tree-like routing table.
I figure that many people simply iterate over the whole routing table because for regular nodes it will only contain a few dozen buckets at most and a DHT node does not see much traffic, so it only has to execute this operation a few times per second and if you implement this in a dense, cache-friendly datastructure then the lion's share might actually be the memory traffic and not the CPU instructions doing a few XORs and comparisons.
I.e. a full-table-scan is just easy to implement.
Let's assume we have a routing table with the following bit prefixes for each bucket. The letters serve as convenient names).
A 000...
B 00100...
C 0010100...
D 0010101000...
E 0010101001...
F 001010101...
G 00101011...
H 001011...
I 0011...
J 01...
K 1...
now let's say we're looking for this target key:
T = 0010011010111111001111100101011000001010010100001100100010011100000000100100000111001101100110110110101100010100111010111001101111110101001001001000100000001001
Also, the buckets aren't entirely full or we need more entries than whatever fits into a single bucket, so we'll have to visit more than one bucket to obtain the desired amount.
Now, the first bucket to visit is fairly obvious, it's B
since its prefix covers the target key.
Since B
's prefix is 5 bits long any entry in that bucket will have a XOR distance to T
of 00000???????...
. 5 prefix bits shared.
B
is the closest bucket to T
, which means that there cannot be any routing table entries closer than the relative distance 00000...
. Conversely that means that the minimal distance that any entry outside of B
can have is 00001...
. Which means the next-closest bucket must cover T xor 00001... -> 00101110101111110[...]
.
The bucket that covers this is H
.
H
is not adjacent to B
Ultimately - assuming we'll have to visit 6 buckets - the order will look like this:
00100... -> B
001011... -> H
001010101... -> F
0010101000... -> D
0010101001... -> E
00101011... -> G
This seems fairly chaotic. But if we plot the prefix-to-target-key distances for each bucket visited it becomes a little more obvious:
00000...
000010...
000011000...
0000110010...
0000110011...
00001101...
So the algorithm is as follows:
TL;DR: "just look one bucket left, one bucket right" is not sufficient. The correct algorithm is fairly involved, a linear scan over the whole table is easier to implement.