Search code examples
python-3.xpandasnetworkxsocial-networking

Networkx not showing all nodes in dataframe


I'm developing a social network based in exchange of emails, which dataset is a csv that can be downloaded at my Google Drive and consists of integers (individuals, column source) connecting to other individuals (integers, column target): https://drive.google.com/file/d/183fIXkGUqDC7YGGdxy50jAPrekaI1273/view?usp=sharing

The point is, my dataframe has 400 rows, but only 21 nodes show up:

enter image description here

Here is the sample code:

import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
import pandas as pd


df=pd.read_csv('/home/......./social.csv', sep=',',header=None)

df=df.iloc[0:400,:]

df.columns=['source','target']

nodes=np.arange(0,400)
G=nx.from_pandas_edgelist(df, "source", "target")
G.add_nodes_from(nodes)
pos = nx.spectral_layout(G)
coordinates=np.concatenate(list(pos.values())).reshape(-1,2)

nx.draw_networkx_edges(G, pos, edgelist=[e for e in G.edges],alpha=0.9)
nx.draw_networkx_nodes(G, pos, nodelist=nodes)
plt.show()

Column source has 160 different individuals and target has 260 different individuals.

The whole algorithm is running right, this is the only issue:

enter image description here

I'm wondering what I'm doing wrong. Any insights are welcome.


Solution

  • Your nodes are being drawn but the the nx.spectral_layout positions them on top of each other.

    If you print the positions:

    pos = nx.spectral_layout(G)
    print(pos)
    

    You get:

    {0: array([0.00927318, 0.01464153]), 1: array([0.00927318, 0.01464153]), 2: array([0.00927318, 0.01464153]), 3: array([0.00927318, 0.01464153]), 4: array([0.00927318, 0.01464153]), 5: array([-1.        , -0.86684471]), 6: array([-1.        , -0.86684471]), ...
    

    And you can already see the overlap by comparing the positions.

    You could instead use nx.circular_layout if you want to see all the nodes:

    fig=plt.figure(figsize=(16,12))
    pos = nx.circular_layout(G)
    nx.draw(G, pos, nodelist=nodes,node_size=40)
    

    And you will get:

    enter image description here