Search code examples
pythonpandasdataframegraphnetworkx

How to make networkx edges from pandas dataframe rows


For context: I am making a visual graph for a protein-protein interaction network. A node here corresponds to a protein and an edge would indicate interaction between two nodes.

Here is my code:

First I import all the modules and files that I need:

import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd

interactome_edges = pd.read_csv("*a_directory*", delimiter = "\t", header = None)
interactome_nodes = pd.read_csv("*a_directory*", delimiter = "\t", header = None)

# A few adjustments for the dataframes
interactome_nodes = interactome_nodes.drop(columns = [0])
interactome_edges.columns = ["node1","node2"]

Dataframe for nodes looks like this:

    1
0   MET3
1   IMD3
2   OLE1
3   MUP1
4   PIS1
...

Dataframe for edges looks like this:

node1   node2
0   MET3    MET3
1   IMD3    IMD4
2   OLE1    OLE1
3   MUP1    MUP1
4   PIS1    PIS1
...

Basically the edge goes from node1 to node2

Now I iterate through each row from the node dataframe and edge dataframe and use it as networkx nodes and edges.

interactome = nx.Graph()

# Adding Nodes to Graph
for index, row in interactome_nodes.iterrows():
    interactome.add_nodes_from(row)

# Adding Edges to Graph
for index, row in interactome_edges.iterrows():
    interactome.add_edges_from(row["node1", "node2"]) #### Here is the problem

My problem is at the adding Edges part. I am currently getting the following error:

KeyError: ('node1', 'node2')

I have also tried :

for index, row in interactome_edges.iterrows():
    interactome.add_edges_from((row["node1"],row["node2"]))

and:

for index, row in interactome_edges.iterrows():
    interactome.add_edges_from(row["node1"],row["node2"])

and also simply:

for index, row in interactome_edges.iterrows():
    interactome.add_edges_from(row)

All of which give me some form of error.

How can I use my node to node dataframe as edges for a networkx graph?


Solution

  • In [9]: import networkx as nx
    
    In [10]: import pandas as pd
    
    In [11]: df = pd.read_csv("a.csv")
    
    In [12]: df
    Out[12]:
      node1 node2
    0  MET3  MET3
    1  IMD3  IMD4
    2  OLE1  OLE1
    3  MUP1  MUP1
    4  PIS1  PIS1
    
    In [13]: G=nx.from_pandas_edgelist(df, "node1", "node2")
    
    In [14]: [e for e in G.edges]
    Out[14]:
    [('MET3', 'MET3'),
     ('IMD3', 'IMD4'),
     ('OLE1', 'OLE1'),
     ('MUP1', 'MUP1'),
     ('PIS1', 'PIS1')]
    

    Networkx has methods to read from pandas dataframe. I have use the edge dataframe provided. Here, I'm using from_pandas_edgelist method to read from the dataframe of edges.

    After plotting the graph,

    nx.draw_planar(G, with_labels = True) 
    plt.savefig("filename2.png") 
    

    enter image description here