I have a large dataset which compares products with a relatedness measure which looks like this:
product1 product2 relatedness
0101 0102 0.047619
0101 0103 0.023810
0101 0104 0.095238
0101 0105 0.214286
0101 0106 0.047619
... ... ...
I used the following code to feed the data into the NetworkX graphing tool and produce an MST diagram:
import networkx as nx
import matplotlib.pyplot as plt
products = (data['product1'])
products = list(dict.fromkeys(products))
products = sorted(products)
G = nx.Graph()
G.add_nodes_from(products)
print(G.number_of_nodes())
print(G.nodes())
row = 0
for c in data['product1']:
p = data['product2'][row]
w = data['relatedness'][row]
if w > 0:
G.add_edge(c,p, weight=w, with_labels=True)
row = row + 1
nx.draw(nx.minimum_spanning_tree(G), with_labels=True)
plt.show()
The resulting diagram looks like this: https://i.sstatic.net/LBrnD.jpg
However, when I re-run the code, with the same data and no modifications, the arrangement of the clusters appears to change, so it then looks different, example here: https://i.sstatic.net/jR62Q.jpg, second example here: https://i.sstatic.net/PLHyo.jpg. The clusters, edges, and weights do not appear to be changing, but the arrangement of them on the graph space is changing each time.
What causes the arrangement of the nodes to change each time without any changes to the code or data? How can I re-write this code to produce a network diagram with approximately the same arrangement of nodes and edges for the same data each time?
The nx.draw
method uses by default the spring_layout
(link to the doc). This layout implements the Fruchterman-Reingold force-directed algorithm which starts with random initial positions. This is this layout effect that you witness in your repetitive trials.
If you want to "fix" the positions, then you should explicitely call the spring_layout
function and specify the initial positions in the pos
argument.