Search code examples
xorbittorrentdhtchordkademlia

What is the unidirectional property and why it helps with hotspots?


In the kademlia paper it's written that the XOR metric is unidirectional. What does it mean precisely? More importantly in what way it alleviates the problem of a frequently queried node? Could you explain me that from the point of view of a node? I mean, if I a hotspot am requested frequently by different nodes, do they exchange cached nodes to get to the target? Can't they just exchange the target ip? Furthermore, it doesn't seem to me that lookups converge along the same path as written, I think its more logical that each node follows a different path wile going farther and farther from itself.


Solution

  • The XOR metric means that A^B gives the same distance as B^A. I'm not sure that it directly alleviates the problem of a frequently query, it's more that nodes from different addresses in the network will perceive query nodes on a search path as having different distance from themselves, thereby caching different nodes after a query completes. Subsequent queries to local nodes will be given different remote nodes in response, thereby potentially spreading the load around the DHT network somewhat.

    When querying the DHT network, the more common query is to ask for data regarding a particular info hash. That's stored by the nodes with the smallest distances between their node IDs and the info hash in question. It's only when you begin querying nodes that are close to the target info hash that the IP addresses of peers start to respond with IP addresses of peers for that torrent. Nodes can't just arbitrarily return peer IPs, as that would require that all nodes store all IPs for all torrents, or that nodes perform subsequent queries on your behalf, which would be lead to suboptimal network use and be open to exploitation.

    Your observation that lookups don't converge on the same path is only correct when there are a surfeit of nodes at the distance being queried. Eventually as you get closer to nodes storing data for the desired info hash, there will be fewer and fewer nodes with such proximity to the target. Thus toward the end of queries, most querying nodes will converge on similar nodes. It's worth keeping in mind that this isn't a problem. Those nodes will only be "hot" for data related to that one particular info hash as the distance between info hashes is going to be very large on average on account of the enormous size of the hash space used. Additionally, were it a popular info hash to be querying for, nodes close to that hash that aren't coping with the traffic will be penalized by the network, and returned less often by nodes on the search path.