Search code examples
pythonnetworkxigrapheigenvector

Fast calculation of eigenvector centrality takes too long in networkx


I am using networkx to calculate eigenvector centrality. The problem is that it takes way too long (already running for about 6 hours). Is there a faster way to get the results?

There are around 200,000 nodes in the graph and 60,000,000 edges.


Solution

  • By looking at the source code, networkx.algorithms.centrality.eigenvector uses the power method to find the leading eigenvector.

    If you stick to networkx use this as Joel noticed:

    eigenvector_centrality_numpy

    centrality = nx.eigenvector_centrality_numpy(G)
    

    Alternatively:

    You can use scipy.sparse.linalg.eigs that uses the ARPACK and request only 1 eigenvector to be returned.

    Toy example:

    import scipy.sparse as sparse
    
    X = np.array() # dimensions 200000 by 200000 as the adjacency
    # Note: k=1 and you request the Largest real.
    vals, vecs = sparse.linalg.eigs(X, k=1, which='LR')
    

    In any case, 2000000 by 200000 is big and depending on the sparsity and the nature of the matrix, the algorithm may take long. You will also need a lot of CPU and RAM.

    Additional tip for networkx.algorithms.centrality.eigenvector:

    If you stick with networkx try to relax the tolerance:

    eigenvector_centrality(G, max_iter=100, tol=1e-06, nstart=None, weight=None)

    Try setting tol=1e-04 or even tol=1e-03