Search code examples
python-2.7networkxpagerank

NetworkX python : pagerank_numpy, pagerank fails but pagerank_scipy works


I am running PageRank on a weighted DiGraph where nodes = 61634, edges = 28,378.

  • pagerank(G) throws me ZeroDivsionError

  • pagerank_numpy(G) throws me ValueError : array to big

  • pagerank_scipy(G) gives me the page ranks though

I can understand that pagerank_numpy error would be due to memory limitations but why does pagerank fail? I tried adding an infinitesimal values to edges with zero weights but the same issues persist. Some pointers would be nice.

Link to my GraphML file - https://mega.co.nz/#!xlYzEDAI!Lyh5pD-NJL61JPfkrNyJrEm0NnFc586A0MUD8OMYAO0

NetworkX version - 1.8.1 Python - 2.7


Solution

  • pagerank fails because it performs its computation using stochastic_graph -- unlike pagerank_numpy or pagerank_scipy. From the docs, stochastic_graph requires:

    A NetworkX graph, must have valid edge weights

    This "valid edge weights" point (which is not explained at all, which I think is a mistake) is the source of your problem.

    For a directed graph, stochastic_graph uses the out_degree of each node to normalize the edges. Again from the docs:

    The [out] degree is the sum of the edge weights adjacent to the node.

    So when you have edges with zero weight or negative weight, the normalization process breaks with a ZeroDivisionError. The reason that negative weights are an issue is that they can cancel out positive weights, and thus give a node degree zero. For example, in your graph, node '2123271' has two edges who's weights sum to 0:

    >>> G.edges('2123271', data=True)
    [('2123271', '1712899', {'weight': -1L}),
     ('2123271', '890839', {'weight': 1L})]
    

    Replacing negative or zero edge weights in your graph with a small, positive edge weight made it so pagerank could run:

    In [1]: import networkx as nx
    In [2]: G = nx.read_graphml("your_graph.graphml")
    In [3]: defaultEdgeWeight = 0.01
    In [4]: for u, v, d in G.edges(data=True):
                if d['weight'] <= 0:
                    G[u][v]['weight'] = defaultEdgeWeight
    In [5]: P = nx.pagerank( G )
    

    Of course, pagerank didn't converge after 102 iterations, but that's another issue.