I am using networkx to calculate eigenvector centrality. The problem is that it takes way too long (already running for about 6 hours). Is there a faster way to get the results?
There are around 200,000 nodes in the graph and 60,000,000 edges.
By looking at the source code, networkx.algorithms.centrality.eigenvector
uses the power method to find the leading eigenvector.
If you stick to networkx
use this as Joel noticed:
centrality = nx.eigenvector_centrality_numpy(G)
Alternatively:
You can use scipy.sparse.linalg.eigs
that uses the ARPACK and request only 1 eigenvector to be returned.
Toy example:
import scipy.sparse as sparse
X = np.array() # dimensions 200000 by 200000 as the adjacency
# Note: k=1 and you request the Largest real.
vals, vecs = sparse.linalg.eigs(X, k=1, which='LR')
In any case, 2000000 by 200000 is big and depending on the sparsity and the nature of the matrix, the algorithm may take long. You will also need a lot of CPU and RAM.
Additional tip for networkx.algorithms.centrality.eigenvector
:
If you stick with networkx try to relax the tolerance:
eigenvector_centrality(G, max_iter=100, tol=1e-06, nstart=None, weight=None)
Try setting tol=1e-04
or even tol=1e-03