network-programming p2p bittorrent dht anonymity

Enumerating Mainline DHT

I'm trying to understand why, historically, was a DHT (distributed hash table) a good system to use for decentralized p2p networks.

From an efficiency point-of-view: it's a fantastic way to have a bunch of nodes know how each node is reachable without complicated communication between them (using XOR distance in the case of mainline DHT).

From an anonymity point-of-view, I don't think that's the case: I'd like to know if it is possible to enumerate a DHT's nodes and whether protection from this discovery is a problem that a DHT should even solve.

For example: imagine a DHT with 100 nodes. By virtue of the DHT's design (at least Mainline DHT), a node would (please correct me if I'm wrong):

know that resource X is in node Y
Also know how to reach node Y

I know that a DHT crawler (like https://github.com/boramalper/magnetico) would be able to enumerate all nodes.

Is my reasoning correct, or did I misunderstand the attack vector?

Many thanks

Solution

Bittorrent makes no attempt to hide the IP address of any swarm member and on top of that some trackers expose APIs that allow fetching a list of all infohashes and then in turn fetching all IPs for each infohash. So in essence the set of bittorrent peers was mostly public anyway. The DHT adds another way to get this list.

This isn't unique to the bittorrent DHT, other p2p networks have similar properties.

Also note that participating in the DHT is not the same as participating in any particular torrent. A node may simply operate as a pure DHT node without any torrent client attached.