For context: I am making a visual graph for a protein-protein interaction network. A node here corresponds to a protein and an edge would indicate interaction between two nodes.
Here is my code:
First I import all the modules and files that I need:
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd
interactome_edges = pd.read_csv("*a_directory*", delimiter = "\t", header = None)
interactome_nodes = pd.read_csv("*a_directory*", delimiter = "\t", header = None)
# A few adjustments for the dataframes
interactome_nodes = interactome_nodes.drop(columns = [0])
interactome_edges.columns = ["node1","node2"]
Dataframe for nodes looks like this:
1
0 MET3
1 IMD3
2 OLE1
3 MUP1
4 PIS1
...
Dataframe for edges looks like this:
node1 node2
0 MET3 MET3
1 IMD3 IMD4
2 OLE1 OLE1
3 MUP1 MUP1
4 PIS1 PIS1
...
Basically the edge goes from node1 to node2
Now I iterate through each row from the node dataframe and edge dataframe and use it as networkx nodes and edges.
interactome = nx.Graph()
# Adding Nodes to Graph
for index, row in interactome_nodes.iterrows():
interactome.add_nodes_from(row)
# Adding Edges to Graph
for index, row in interactome_edges.iterrows():
interactome.add_edges_from(row["node1", "node2"]) #### Here is the problem
My problem is at the adding Edges part. I am currently getting the following error:
KeyError: ('node1', 'node2')
I have also tried :
for index, row in interactome_edges.iterrows():
interactome.add_edges_from((row["node1"],row["node2"]))
and:
for index, row in interactome_edges.iterrows():
interactome.add_edges_from(row["node1"],row["node2"])
and also simply:
for index, row in interactome_edges.iterrows():
interactome.add_edges_from(row)
All of which give me some form of error.
How can I use my node to node dataframe as edges for a networkx graph?
In [9]: import networkx as nx
In [10]: import pandas as pd
In [11]: df = pd.read_csv("a.csv")
In [12]: df
Out[12]:
node1 node2
0 MET3 MET3
1 IMD3 IMD4
2 OLE1 OLE1
3 MUP1 MUP1
4 PIS1 PIS1
In [13]: G=nx.from_pandas_edgelist(df, "node1", "node2")
In [14]: [e for e in G.edges]
Out[14]:
[('MET3', 'MET3'),
('IMD3', 'IMD4'),
('OLE1', 'OLE1'),
('MUP1', 'MUP1'),
('PIS1', 'PIS1')]
Networkx has methods to read from pandas dataframe. I have use the edge dataframe provided. Here, I'm using from_pandas_edgelist
method to read from the dataframe of edges.
After plotting the graph,
nx.draw_planar(G, with_labels = True)
plt.savefig("filename2.png")