Search code examples
pythonpandasplotlynetworkxundirected-graph

pandas DataFrame edge list to networkX graph object


I am trying to create an undirected graph from a DataFrame formatted_unique_edges - the 'weight' column will purely be used for edge colouring in downstream visualisation using plotly:

    source      target      weight
0   protein_2   protein_3   3
1   protein_2   protein_6   2
2   protein_3   protein_6   2
3   protein_2   protein_4   2
4   protein_2   protein_5   2
5   protein_3   protein_4   2
6   protein_3   protein_5   2
7   protein_4   protein_5   2
8   protein_4   protein_6   1
9   protein_5   protein_6   1

The first lines in the linked plotly example, which I am trying to emulate, is:

G = nx.random_geometric_graph(200, 0.125)
edge_x = []
edge_y = []
for edge in G.edges():
    x0, y0 = G.nodes[edge[0]]['pos']
    x1, y1 = G.nodes[edge[1]]['pos']
    edge_x.append(x0)
    edge_x.append(x1)
    edge_x.append(None)
    edge_y.append(y0)
    edge_y.append(y1)
    edge_y.append(None)

I first convert formatted_unique_edges to a Graph, then try to emulate the code above, with some diagnostic print statements:

G = nx.from_pandas_edgelist(formatted_unique_edges, 
                            edge_attr=True) 
#also tried G = nx.random_geometric_graph(200, 0.125) as per plotly example

edge_x = []
edge_y = []
for edge in G.edges():
    print(edge) #('proteinN', 'proteinM')
    print(G.nodes[edge[0]]) #{}
    print(G.nodes[edge[1]]) #{}
    x0, y0 = G.nodes[edge[0]]['pos']
    #####
    #THROWS KeyError: 'pos' if G is from formatted_unique_edges
    #####
    #prints {'pos': [float, float]} if G is from nx.random_geometric_graph
    x1, y1 = G.nodes[edge[1]]['pos']
    edge_x.append(x0)
    edge_x.append(x1)
    edge_x.append(None)
    edge_y.append(y0)
    edge_y.append(y1)
    edge_y.append(None)

As stated in the comments, I am getting a KeyError from G.nodes[edge[0]]['pos']. I had a look in the spyder variable explorer and G.nodes._nodes from nx.random_geometric_graph has the format:

{0   : {'pos' : [pos_float, pos_float]}, 
 1   : {'pos' : [pos_float, pos_float]},
 ...
 199 : {'pos' : [pos_float, pos_float]}
}

Whereas as G.nodes._nodes from formatted_unique_edges has the format:

{'protein_2' : {},
 'protein_3' : {},
 'protein_4' : {},
 'protein_5' : {},
 'protein_6' : {}}

This all suggests I am making my Graph object from formatted_unique_edges incorrectly with nx.from_pandas_edgelist - can someone advise how I should be doing it?

Thanks! Tim


Solution

  • You missed to generate a layout for your graph. random_geometric_graph generate a graph but not only. It also call a layout to generate the coordinates (pos).

    # Convert your dataframe to graph
    G = nx.from_pandas_edgelist(formatted_unique_edges, edge_attr=True)
    
    # Generate the layout and set the 'pos' attribute
    pos = nx.drawing.layout.spring_layout(G)
    nx.set_node_attributes(G, pos, 'pos')
    
    edge_x = []
    edge_y = []
    for edge in G.edges():
        x0, y0 = G.nodes[edge[0]]['pos']
        x1, y1 = G.nodes[edge[1]]['pos']
        edge_x.append(x0)
        edge_x.append(x1)
        edge_x.append(None)
        edge_y.append(y0)
        edge_y.append(y1)
        edge_y.append(None)
    

    Output:

    >>> G.nodes._nodes
    {'protein_2': {'pos': array([0.5830424, 0.0301945])},
     'protein_3': {'pos': array([-0.42158911,  0.33654032])},
     'protein_6': {'pos': array([0.30069049, 1.        ])},
     'protein_4': {'pos': array([-0.71990583, -0.51877307])},
     'protein_5': {'pos': array([ 0.25776204, -0.84796174])}}