Search code examples
rnetwork-programmingigraph

how long does it take for igraph r to estimate network centrality measures for a large network


I have a network of 300000 nodes and 800000 edges. How long does it take for igraph package in R to calculate network centrality measures for each node (including closeness and betweenness).


Solution

  • The runtimes for betweenness and closeness are both quadratic, so increase substantially as the number of nodes increases. These authors estimate 7,000 seconds to calculate betweenness for a graph with 325,000 edges. A graph with 800,000 edges will take much longer.

    igraph does have specific functions for large graphs - estimate_betweenness and estimate_closeness, which the manual says are not quadratic in runtime. You define a cutoff, which is the largest path length that will be included in the calculation. Traditionally, betweenness considers paths of any length. Defining a cutoff substantially cuts down the runtime:

    > lg <- erdos.renyi.game(300000,800000,type="gnm")
    > ptm <- proc.time()
    > igraph::estimate_betweenness(lg, cutoff = 3)[1:10]
     [1]  29  12  14  90  29  98  69  48 200  86
    > proc.time() - ptm
       user  system elapsed 
     27.605   0.327  30.113 
    

    ~ 30 sec. This is on a dual-core macbook air. As you increase the cutoff the runtime increases.

    The tradeoff, of course, is that you have what amounts to an estimate of each node's betweenness score, rather than a direct calculation.


    Reference:

    Kang, U., Papadimitriou, S., Sun, J., & Tong, H. (2011, April). Centralities in large networks: Algorithms and observations. In Proceedings of the 2011 SIAM International Conference on Data Mining (pp. 119-130). Society for Industrial and Applied Mathematics. Link