Search code examples
kademlia

Is it possible to compute an approximate size estimate of a kademlia network from a node's k-buckets?


Assuming that nodeids are evenly distributed, would it be possible to calculate an estimated number of nodes based on the k-bucket cache?

The reason I want this is that I want to create a kademlia network based on mainline DHT with BEP42 added (https://www.bittorrent.org/beps/bep_0042.html) that stores data with some level of trust that a trustworthy is actually providing it, and not a malicious actor who has an interest in altering the value for a given infohash key.

I want to use the estimated number of nodes to determine how much I can trust the answer a node gives me. So if a node gets a reply from a peer, then by using on the distance of the nodeid of the peer and the infohash requested, and the size of the network, I would calculate a trust score.

I'm assuming I could multiply the size of the k-buckets in each layer to get an estimate. For example, in the following diagram, https://docs.google.com/presentation/d/11qGZlPWu6vEAhA7p3qsQaQtWH7KofEC9dMeBFZ1gYeA/edit#slide=id.g1718cc2bc_01994

the total estimate would be, (by going bottom up): (3+2)(4+1)(4+1)*(4+1) = 625


Solution

  • Yes, it's possible to get an estimate this way. But it can be a fairly rough estimate. A better approach is to use the k-closest-node set of random find_node lookups which you'll have to do as part of routing table maintenance anyway. They provide more samples to calculate those estimates.

    If you're designing a new DHT from scratch rather than using the bittorrent DHT then any node-density based security measure is pretty weak since someone could always buy some capacity on a botnet to attack a specific key region or something like that. You should consider whether it's possible to use cryptography to secure whatever you want to secure.