I am trying to create a graph using networkx and so far I have created nodes from the following text files : File 1(user_id.txt) sample data :
user_000001
user_000002
user_000003
user_000004
user_000005
user_000006
user_000007
File 2(user_country.txt) sample data : contains few blank lines too in case if user didn't enter his country details
Japan
Peru
United States
Bulgaria
Russian Federation
United States
File 3(user_agegroup.txt) data : contains four age groups
[12-18],[19-25],[26-32],[33-39]
I have other two files with following sample data for adding edges in the graph
File 4(id,agegroup.txt)
user_000001,[19-25]
user_000002,[19-25]
user_000003,[33-39]
user_000004,[19-25]
user_000005,[19-25]
user_000006,[19-25]
user_000007,[26-32]
File 5(id,country.txt)
(user_000001,Japan)
(user_000002,Peru)
(user_000003,United States)
(user_000004,)
(user_000005,Bulgaria)
(user_000006,Russian Federation)
(user_000007,United States)
So far I have written following code to draw graphs with only nodes:
(Please check the code because print g.number_of_nodes()
never prints correct no. of nodes though print g.nodes()
shows correct no. of nodes.)
import csv
import networkx as nx
import matplotlib.pyplot as plt
g=nx.Graph()
#extract and add AGE_GROUP nodes in graph
f1 = csv.reader(open("user_agegroup.txt","rb"))
for row in f1:
g.add_nodes_from(row)
nx.draw_circular(g,node_color='blue')
#extract and add COUNTRY nodes in graph
f2 = csv.reader(open('user_country.txt','rb'))
for row in f2:
g.add_nodes_from(row)
nx.draw_circular(g,node_color='red')
#extract and add USER_ID nodes in graph
f3 = csv.reader(open('user_id.txt','rb'))
for row in f3:
g.add_nodes_from(row)
nx.draw_random(g,node_color='yellow')
print g.nodes()
plt.savefig("path.png")
print g.number_of_nodes()
plt.show()
Besides this I can't figure out how to add edges from file4 and file5. Any help with code for that is appreciated. Thanks.
For simplification I made user ID's [1,2,3,4,5,6,7] in the user_id.txt and id,country.txt files. You have some problems in your code:
1- First you add some nodes to the graph (for instance from the user_id.txt file) then you draw it, then you add some other nodes to the graph from another file then you re-draw the whole graph again on the same figure. So, in the end you have many graph in one figure.
2- You used the draw_circular method to draw twice, that is why the blue nodes never appeared as they are overwritten by the 'red' nodes.
I have made some changes to your code to draw only one time in the end. And to draw nodes with the needed colors, I added an attribute called colors when adding nodes. Then I used this attribute to build a color map which I sent to draw_networkx function. Finally, adding edges was a bit tricky because of the empty field in the id,country.txt so I had to remove empty nodes before creating the graph. Here is the code and the figure that appears afterwards.
G=nx.Graph()
#extract and add AGE_GROUP nodes in graph
f1 = csv.reader(open("user_agegroup.txt","rb"))
for row in f1:
G.add_nodes_from(row, color = 'blue')
#extract and add COUNTRY nodes in graph
f2 = csv.reader(open('user_country.txt','rb'))
for row in f2:
G.add_nodes_from(row, color = 'red')
#extract and add USER_ID nodes in graph
f3 = csv.reader(open('user_id.txt','rb'))
for row in f3:
G.add_nodes_from(row, color = 'yellow')
f4 = csv.reader(open('id,agegroup.txt','rb'))
for row in f4:
if len(row) == 2 : # add an edge only if both values are provided
G.add_edge(row[0],row[1])
f5 = csv.reader(open('id,country.txt','rb'))
for row in f5:
if len(row) == 2 : # add an edge only if both values are provided
G.add_edge(row[0],row[1])
# Remove empty nodes
for n in G.nodes():
if n == '':
G.remove_node(n)
# color nodes according to their color attribute
color_map = []
for n in G.nodes():
color_map.append(G.node[n]['color'])
nx.draw_networkx(G, node_color = color_map, with_labels = True, node_size = 500)
plt.savefig("path.png")
plt.show()