Search code examples
bittorrentdhtkademlia

Mainline DHT: why hash in ping is different than hash in find_node?


I am working with Mainline DHT implementation. And I saw strange behaviour.

Let’s say I know node IP and port: 1.1.1.1:7777. I send "find_node" request to him with my own node hash as a target. I get 8 nodes from him, let’s say the first one hash is: abcdeabcdeabcdeabcde and IP: 2.2.2.2:8888. Now I send "ping" request to 2.2.2.2:8888 and that node responses me with completely different hash than I got from 1.1.1.1:7777 in "find_node" response. And I see that is not individual case. What’s going on? Why hashes of the same node from 2 different sources are different? Thanks for answer.


Solution

  • This may be a malicious node that does not keep its node ID consistent in an effort to get into as many routing tables as possible. It might be doing that for data harvesting or DoS amplification purposes.

    Generally you shouldn't put too much trust in anything that remote nodes report and sanitize the data. In the case of it not keeping its ID consistent you should remove it from your routing table and disregard results returned in its queries. I have listed a bunch of possible sanitizing approaches beyond BEP42 in the documentation of my own DHT implementation.

    Another possibility is that the node B simply changed its ID in the meantime (e.g. due to a restart) and node A either has not updated it yet or does not properly keep track of ID changes. But this shouldn't be happening too frequently.

    And I see that is not individual case.

    In total I would only expect this behavior from a tiny total fraction of the network. So you should compare the number of unique IP addresses sending bogus responses to the number of unique IPs sending sane ones. It's easy to get these kinds of statistics wrong if your implementation is naive and gets trapped by malicious nodes to contact even more malicious nodes.

    But during a lookup you may see this more frequently during the terminal phase when you get polluted data from nodes that do not sanitize their routing table properly. As one example old libtorrent versions did not (see related issue; note that I'm not singling out libtorrent here, many implementations are crappy in this area).