Search code examples
pythongraphnetworkx

Creating graph using networkX


I am trying to make a graph for this data using the following code:

import networkx as nx
import csv
import matplotlib.pyplot as plt

graph = nx.Graph()
filename = "tubedata.csv"

with open(filename) as tube_data:
    starting_station = [row[0] for row in csv.reader(tube_data, delimiter=',')]

with open(filename) as tube_data:
    ending_station = [row[1] for row in csv.reader(tube_data, delimiter=',')]
    
with open(filename) as tube_data:
    average_time_taken = [row[3] for row in csv.reader(tube_data, delimiter=',')]
    
with open(filename) as tube_data:
    for line in tube_data:
        graph.add_edge(starting_station, ending_station, weight=average_time_taken)

However, I keep getting the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_101/53822893.py in <module>
     17 with open(filename) as tube_data:
     18     for line in tube_data:
---> 19         graph.add_edge(starting_station, ending_station, weight=average_time_taken)

/opt/conda/lib/python3.9/site-packages/networkx/classes/graph.py in add_edge(self, u_of_edge, v_of_edge, **attr)
    870         u, v = u_of_edge, v_of_edge
    871         # add nodes
--> 872         if u not in self._node:
    873             self._adj[u] = self.adjlist_inner_dict_factory()
    874             self._node[u] = self.node_attr_dict_factory()

TypeError: unhashable type: 'list'

I have searched the error and understand that I need to pass a data structure that is immutable. I changed the code to the following:

with open(filename) as tube_data:
   starting_station = (row[0] for row in csv.reader(tube_data, delimiter=','))

with open(filename) as tube_data:
   ending_station = (row[1] for row in csv.reader(tube_data, delimiter=','))
   
with open(filename) as tube_data:
   average_time_taken = (row[3] for row in csv.reader(tube_data, delimiter=','))
   
with open(filename) as tube_data:
   for line in tube_data:
       graph.add_edge(starting_station, ending_station, weight=average_time_taken)

This resolves the above error but produces a graph with only two nodes and 1 edge? How can I capture the full data as a graph?


Solution

  • I would create the graph using the following steps:

    1. Use the pandas library to read in the data into a DataFrame object
    2. Create an edge list [(source, target, weight)] from the data frame rows
    3. Create an empty directed graph in networkX
    4. Add edges to the DiGraph object by passing in the edge list
    import networkx as nx
    import pandas as pd
    
    data = pd.read_csv('tubedata.csv',header=None)
    
    edgelist = data.apply(lambda x: (x[0],x[1],x[3]),axis=1).to_list()
    
    # edgelist
    # [('Harrow & Wealdstone', 'Kenton', 3),
    #  ('Kenton', 'South Kenton', 2),
    #  ('South Kenton', 'North Wembley', 2),
    #  ('North Wembley', 'Wembley Central', 2),...
    
    G = nx.DiGraph()
    G.add_weighted_edges_from(edgelist)
    
    list(G.edges(data=True))[:5]
    # >>>[('Harrow & Wealdstone', 'Kenton', {'weight': 3}),
    #     ('Kenton', 'South Kenton', {'weight': 2}),
    #     ('South Kenton', 'North Wembley', {'weight': 2}),
    #     ('North Wembley', 'Wembley Central', {'weight': 2}),
    #     ('Wembley Central', 'Stonebridge Park', {'weight': 3})]
    

    You can also get the same result going straight for from_pandas_edgelist see documentation, after renaming the pandas data frame columns:

    data = data.rename(columns={0:'source',1:'target',3:'average_time_taken'})
    
    G2 = nx.convert_matrix.from_pandas_edgelist(data, source='source', target='target', edge_attr='average_time_taken', create_using=nx.DiGraph)
    
    
    list(G2.edges(data=True))[:5]
    # [('Harrow & Wealdstone', 'Kenton', {'average_time_taken': 3}),
    # ('Kenton', 'South Kenton', {'average_time_taken': 2}),
    # ('South Kenton', 'North Wembley', {'average_time_taken': 2}),
    # ('North Wembley', 'Wembley Central', {'average_time_taken': 2}),
    # ('Wembley Central', 'Stonebridge Park', {'average_time_taken': 3})]