I am running PageRank on a weighted DiGraph where nodes = 61634, edges = 28,378.
pagerank(G)
throws me ZeroDivsionError
pagerank_numpy(G)
throws me ValueError : array to big
pagerank_scipy(G)
gives me the page ranks though
I can understand that pagerank_numpy
error would be due to memory limitations but why does pagerank fail? I tried adding an infinitesimal values to edges with zero weights but the same issues persist. Some pointers would be nice.
Link to my GraphML file - https://mega.co.nz/#!xlYzEDAI!Lyh5pD-NJL61JPfkrNyJrEm0NnFc586A0MUD8OMYAO0
NetworkX version - 1.8.1 Python - 2.7
pagerank
fails because it performs its computation using stochastic_graph
-- unlike pagerank_numpy
or pagerank_scipy
. From the docs, stochastic_graph
requires:
A NetworkX graph, must have valid edge weights
This "valid edge weights" point (which is not explained at all, which I think is a mistake) is the source of your problem.
For a directed graph, stochastic_graph
uses the out_degree
of each node to normalize the edges. Again from the docs:
The [out] degree is the sum of the edge weights adjacent to the node.
So when you have edges with zero weight or negative weight, the normalization process breaks with a ZeroDivisionError
. The reason that negative weights are an issue is that they can cancel out positive weights, and thus give a node degree zero. For example, in your graph, node '2123271'
has two edges who's weights sum to 0
:
>>> G.edges('2123271', data=True)
[('2123271', '1712899', {'weight': -1L}),
('2123271', '890839', {'weight': 1L})]
Replacing negative or zero edge weights in your graph with a small, positive edge weight made it so pagerank
could run:
In [1]: import networkx as nx
In [2]: G = nx.read_graphml("your_graph.graphml")
In [3]: defaultEdgeWeight = 0.01
In [4]: for u, v, d in G.edges(data=True):
if d['weight'] <= 0:
G[u][v]['weight'] = defaultEdgeWeight
In [5]: P = nx.pagerank( G )
Of course, pagerank
didn't converge after 102 iterations, but that's another issue.