Search code examples
pythonnetwork-programmingvisualizationnetworkx

Network Visualization, How to align nodes and draw simpler graph?


I've been working on visualizing relations between application genres. It's not exactly a 'network', but I would like to draw a network graph.

There are 32 genres each, and the relation between each genre has indicated like this:

genre_pt.most_common(20)

[(('Personalization', 'Communication'), 22274),
 (('Personalization', 'Social'), 9774),
 (('Communication', 'Personalization'), 8393),
 (('Communication', 'Communication'), 6244),
 (('Lifestyle', 'Health & Fitness'), 4142),
 (('Health & Fitness', 'Communication'), 3737),
 (('Tools', 'Communication'), 3584),
 (('Personalization', 'Tools'), 3082),
 (('Social', 'Personalization'), 2767),
 (('Personalization', 'Books & Reference'), 2662),
 (('Personalization', 'Health & Fitness'), 2548),
 (('Education', 'Communication'), 2530),
 (('Personalization', 'Education'), 2376),
 (('Social', 'Communication'), 2297),
 (('Personalization', 'Personalization'), 2285),
 (('Social', 'Health & Fitness'), 2261),
 (('Personalization', 'Finance'), 1985),
 (('Communication', 'Social'), 1926),
 (('Personalization', 'Lifestyle'), 1829),
 (('Communication', 'Tools'), 1729)]

I want to make a directed network graph, and the first value of the tuple indicates where the node from, the next value indicates where the node arrives, and the last numeric value is the weight between the two nodes.

Until now, I've managed to make plots using pyvis or networkx by the following code, but as I have too many nodes(32 each, so 32*32 = 1024!!) the plots haven't been clear.

net = Network(notebook=True)

for gen in set(genre_dict.values()): #add node
    net.add_node(gen, label=gen)

for k,v in zip(genre_pt.keys(), genre_pt.values()):
    if all(k) is False: continue
    net.add_edge(k[0], k[1], weight= v) #add values between nodes

ngx = nx.complete_graph(5)
net.from_nx(ngx)
net.show("example.html")

enter image description here

G = nx.DiGraph()

for k,v in zip(genre_pt.keys(), genre_pt.values()):
    G.add_edge(k[0], k[1], weight = v)

pos = nx.spring_layout(G)

nx.draw_networkx_nodes(G, pos, node_size=700)
edge_width = [0.15 * G[u][v]['weight'] for u, v in G.edges()]

graph = nx.draw_networkx(G,pos,
                 alpha = 0.7,
                 with_labels = True, width = edge_width,
                 edge_color ='.4', cmap = plt.cm.Blues)

enter image description here

I would like to see the directed relation(how strong the weight is) between nodes in a clear way.

It would be the best if I can get a graph that looks like enter image description here

or at least

enter image description here

like this, with better clarification.

I would appreciate it if there's anybody who can help me out with this problem. Thank you, in advance! :D


Solution

  • Here is one solution. Since nodes that are identical strings, they will be assumed to be the same node by networkx. My solution was to just use integers for the nodes and apply node labels in the plot only via a dictionary mapping. I then calculated a custom dictionary of positions.

    Also note that I renamed the graph to DG since this is the naming convention for directed graphs.

    Unfortunately, the arrowheads look odd with matplotlib when plotting really thick lines, and according to this SO question I'm not sure that much can be done to fix it except manually adjusting some relevant parameters.

    First the output, then the copy-pastable code:

    aligned_digraph

    import networkx as nx
    import matplotlib.pyplot as plt
    import numpy as np
    
    genre_pt = [(('Personalization', 'Communication'), 22274),
                (('Personalization', 'Social'), 9774),
                (('Communication', 'Personalization'), 8393),
                (('Communication', 'Communication'), 6244),
                (('Lifestyle', 'Health & Fitness'), 4142),
                (('Health & Fitness', 'Communication'), 3737),
                (('Tools', 'Communication'), 3584),
                (('Personalization', 'Tools'), 3082),
                (('Social', 'Personalization'), 2767),
                (('Personalization', 'Books & Reference'), 2662),
                (('Personalization', 'Health & Fitness'), 2548),
                (('Education', 'Communication'), 2530),
                (('Personalization', 'Education'), 2376),
                (('Social', 'Communication'), 2297),
                (('Personalization', 'Personalization'), 2285),
                (('Social', 'Health & Fitness'), 2261),
                (('Personalization', 'Finance'), 1985),
                (('Communication', 'Social'), 1926),
                (('Personalization', 'Lifestyle'), 1829),
                (('Communication', 'Tools'), 1729)]
    
    G1_keys = set([k[0] for k, _ in genre_pt])
    G2_keys = set([k[1] for k, _ in genre_pt])
    G_keys = G1_keys.union(G2_keys)
    num_keys = len(G_keys)
    G_mapping = {k: v for v, k in enumerate(G_keys)}
    G_rev_mapping = {k: v for k, v in enumerate(G_keys)}
    
    edge_list = []
    for edge, weight in genre_pt:
        mapped_edge = (G_mapping[edge[0]], G_mapping[edge[1]] + num_keys, weight)
        edge_list.append(mapped_edge)
    
    node_labels = {k: v for k, v in G_rev_mapping.items()}
    node_labels.update({k + num_keys: v for k, v in G_rev_mapping.items()})
    
    DG = nx.DiGraph()
    
    DG.add_weighted_edges_from(edge_list)
    DG.add_nodes_from([k for k in G_rev_mapping.keys()])
    
    pos = {}
    for node in node_labels.keys():
        x_spacing = np.linspace(-0.8, 0.8, num_keys)
        x = x_spacing[node] if node < num_keys else x_spacing[node - num_keys]
        y = 0.5 if node < num_keys else -0.5
        pos[node] = (x, y)
    
    edge_width = [DG[u][v]['weight'] for u, v in DG.edges()]
    normalized_edge_width = [10 * width / max(edge_width) for width in edge_width]
    
    plt.figure(1, figsize=(24, 8))
    graph = nx.draw_networkx(DG, pos,
                             alpha=0.7,
                             with_labels=True, width=normalized_edge_width,
                             edge_color='.4', cmap=plt.cm.Blues, node_size=4000, labels=node_labels,
                             arrowstyle='->,head_width=0.6,head_length=0.5')