I have a network of 300000 nodes and 800000 edges. How long does it take for igraph package in R to calculate network centrality measures for each node (including closeness and betweenness).
The runtimes for betweenness and closeness are both quadratic, so increase substantially as the number of nodes increases. These authors estimate 7,000 seconds to calculate betweenness for a graph with 325,000 edges. A graph with 800,000 edges will take much longer.
igraph
does have specific functions for large graphs - estimate_betweenness
and estimate_closeness
, which the manual says are not quadratic in runtime. You define a cutoff, which is the largest path length that will be included in the calculation. Traditionally, betweenness considers paths of any length. Defining a cutoff substantially cuts down the runtime:
> lg <- erdos.renyi.game(300000,800000,type="gnm")
> ptm <- proc.time()
> igraph::estimate_betweenness(lg, cutoff = 3)[1:10]
[1] 29 12 14 90 29 98 69 48 200 86
> proc.time() - ptm
user system elapsed
27.605 0.327 30.113
~ 30 sec. This is on a dual-core macbook air. As you increase the cutoff the runtime increases.
The tradeoff, of course, is that you have what amounts to an estimate of each node's betweenness score, rather than a direct calculation.
Kang, U., Papadimitriou, S., Sun, J., & Tong, H. (2011, April). Centralities in large networks: Algorithms and observations. In Proceedings of the 2011 SIAM International Conference on Data Mining (pp. 119-130). Society for Industrial and Applied Mathematics. Link