Search code examples
pythonnumpyscipynetworkxpagerank

Networkx: Differences between pagerank, pagerank_numpy, and pagerank_scipy?


Does anyone know about the differences in accuracy between the three different pagerank functions in Networkx?

I have a graph of 1000 nodes and 139732 edges, and the "plain" pagerank function didn't seem to work at all -- all but two of the nodes had the same PG, so I'm assuming this function doesn't work quite as well for large graphs?

pagerank_numpy's values also seemed to be a little bit more spread out than pagerank_scipy's values. The documentation for this function says that "This will be the fastest and most accurate for small graphs." What is meant by "small" graphs?

Also, why doesn't pagerank_numpy allow for max_iter and tol arguments?


Solution

  • Each of the three functions uses a different approach to solving the same problem:

    networkx.pagerank() is a pure-Python implementation of the power-method to compute the largest eigenvalue/eigenvector or the Google matrix. It has two parameters that control the accuracy - tol and max_iter.

    networkx.pagerank_scipy() is a SciPy sparse-matrix implementation of the power-method. It has the same two accuracy parameters.

    networkx.pagerank_numpy() is a NumPy (full) matrix implementation that calls the numpy.linalg.eig() function to compute the largest eigenvalue and eigenvector. That function is an interface to the LAPACK dgeev function which is uses a matrix decomposition (direct) method with no tunable parameters.

    All three should produce the same answer (within numerical roundoff) for well-behaved graphs if the tol parameter is small enough and the max_iter parameter is large enough. Which one is faster depends on the size of your graph and how well the power method works on your graph.

    In [12]: import networkx as nx
    
    In [13]: G=nx.gnp_random_graph(1000,0.01,directed=True)
    
    In [14]: %timeit nx.pagerank(G,tol=1e-10)
    10 loops, best of 3: 157 ms per loop
    
    In [15]: %timeit nx.pagerank_scipy(G,tol=1e-10)
    100 loops, best of 3: 14 ms per loop
    
    In [16]: %timeit nx.pagerank(G)
    10 loops, best of 3: 137 ms per loop