Search code examples
pythonnetworkxnetwork-analysis

Why does networkx say my directed graph is disconnected when finding diameter?


I'm crawling slideshare.net graph, starting at my node and following all the users in BFS, until the number of visited nodes is 1000. I perform BFS in the following way:

from urllib.request import urlopen
from collections import deque
import sys
import json
import codecs
import csv
import io
import hashlib
import time
import xml.etree.ElementTree as etree
queue = deque(["himanshutyagi11"])
while len_visitedset < 1ooo:
        vertex = queue.pop()
        if vertex in visited:
            continue
        visited.add(vertex)
        length_visited = len(visited)
        print(vertex,length_visited)
        crawl(vertex)

crawl() is a function in which I make the slideshare api query as explained here create the query payload by using my shared_secret and api_key ( given at the time for api registration), send the query and parse the XML response stored in variable 'response'. After parsing, I add the contacts of the present node in the queue.

request_timestamp = int(time.time())
request_hash_string = shared_secret+str(request_timestamp)
request_hash_value = hashlib.sha1(request_hash_string.encode('utf-8')).hexdigest()
request_url = 'https://www.slideshare.net/api/2/get_user_contacts?username_for='+username+'&api_key='+api_key+'&hash='+request_hash_value+'&ts='+str(request_timestamp)
response = etree.parse(urlopen(request_url)).getroot()
# Append all the adjacent nodes of this user to the queue.
    for child in response:
        friend_name = child[0].text
        queue.appendleft(friend_name)
edge_file = open('non_anonymized.csv','a')
    for child in response:
        f_name = child[0].text                              # Name of friend is stored in variable 'f_name'
        edge_file.write(username+','+f_name+'\n')          # username is the name of user currently being crawled
    edge_file.close()

While crawling I also create a edgelist.csv file, that contains all the edges in the graph. This file seems to be fine. Also other functions such as degree(), in_degree(), average_clustering() seem to be working fine.

Then I use networkx to create a graph, which has 1000 nodes. But if I try to find diameter of this graph using following function:

diameter = nx.diameter(graph)

With above code, I am not able to find the diameter of my graph, this doe not return anything and my program is stuck at this line. Any insights to what might be happening ? Mine is a connected graph. I'm converting it to an undirected one using to_undirected() function. I tired running it with the directed graph, and I got the following error
networkx.exception.NetworkXError: Graph not connected: infinite path length

My question is how can it be disconnected since I am using BFS to crawl.

Python 3.4
Networkx 1.9.1


Solution

  • The source code for diameter is here. It relies on eccentricity which is the function just above that in the source code. eccentricity finds the shortest path from a node to all other nodes. The error message you are getting comes from this part of the code:

    if L != order:
        msg = "Graph not connected: infinite path length"
        raise networkx.NetworkXError(msg)
    

    Here L is the number of nodes that were reachable from a given node and order is the number of nodes in the network. L != order means that there are nodes not reachable from the given node. In the case of an undirected network, this means the network is not connected. However in your case you have a directed network. For a directed network L != order means that the network is not "strongly-connected". It could actually be weakly-connected, and yours presumably is.

    So you've hit an error message that is not quite accurate.

    For the directed network you've created, the diameter is infinite: if there is a node u which has no path to v, this means an infinite diameter.