Search code examples
pythonnetworkx

Create undirected graph in NetworkX in python from pandas dataframe


I am new to NetworkX package in python. I want to solve the following problem.

lets say this is my data set:

import pandas as pd 
d = {'label': [1, 2, 3, 4, 5], 'size': [10, 8, 6, 4, 2], 'dist': [0, 2, -2, 4, -4]}
df = pd.DataFrame(data=d)
df 

label and size in the df are quite self-explanatory. The dist column measures the distance from the biggest label (label 1) to the rest of the labels. Hence dist is 0 in the case of label 1.

I want to produce something similar to the picture below: enter image description here

Where the biggest label in size is in a central position (1abel 1). Edges are the distance from label 1 to all other labels and the size of nodes are proportional to the size of each label. Is it possible?

Thank you very much in advance. Please let me know if the question is unclear.


Solution

  • import matplotlib.pyplot as plt
    import networkx as nx
    
    G = nx.Graph()
    for _, row in df.iterrows():
        G.add_node(row['label'], pos=(row['dist'], 0), size=row['size'])
    biggest_node = 1
    for node in G.nodes:
        if node != biggest_node:
            G.add_edge(biggest_node, node)
    
    nx.draw(G,
            pos={node: attrs['pos'] for node, attrs in G.nodes.items()},
            node_size=[node['size'] * 100 for node in G.nodes.values()],
            with_labels=True
            )
    plt.show()
    

    Which plots

    enter image description here

    Notes:

    You will notice the edges in 1-3 and 1-2 are thicker, because they overlap with the edge sections from 1-5 and 1-4 respectively. You can address that by having one only one edge from the center to the furthest node out in each direction and since every node will be on the same line, it'll look the same.

    coords = [(attrs['pos'][0], node) for node, attrs in G.nodes.items()]
    nx.draw(G,
            # same arguments as before and also add
            edgelist=[(biggest_node, min(coords)[1]), (biggest_node, max(coords)[1])]
            )
    

    The 100 factor in the list for the node_size argument is just a scaling factor. You can change that to whatever you want.