Adjacency matrix from pandas dataframe in Python

The below is a small example of what I am trying to do in python. I am working with networks, having 15000 distinct nodes in my network. Data is from pandas dataset:

Node Target  Node_Attrib
mom    dad       0.2
mom    grandmother 0.12
mom    grandfather 0.24
mom    Lucy      0.2
dad    mom       0.4
dad    Lucy      0.3
Lucy   mom       0.1
Lucy   dad       0.3
Lucy   Mark      0.1
Lucy   grandmother 0.2
Lucy   grandfather 0.1

The network is created as follows:

G=nx.from_pandas_edgelist(df,’Node’, ‘Target’,[‘Node_Attrib’]

Where nx is networkx. Since I would like to perform some analysis, I would need to use adjacency matrix. I am thinking of using crosstab for doing that:

adj = pd.crosstab(df.Node, df.Target)
idx=adj.columns.union(df.index)
adj=adj.reindex(index=idx,columns=idx,fill_value=0)

I am wondering if this is the best approach to get the adjacency matrix in python, also due to the number of nodes in the network. Do you know a different approach that could better manage with thousands of nodes (and edges) in Python?

Solution

First of all, nx.from_pandas_edgelist() will create an undirected graph by default. That means it first sets the value of the edge (mom, Lucy) to 0.2, as it's the first time this edge is encountered in your table. But when you parse (Lucy, mom), the same edge will be updated to the new value.

>>> G.get_edge_data('mom', 'Lucy')
{'Node_Attrib': 0.1}

For a directed graph, change the line to

G = nx.from_pandas_edgelist(df, 'Node', 'Target', ['Node_Attrib'], create_using=nx.DiGraph())

Networkx has the function nx.adjacency_matrix() which creates a scipy sparse matrix. This is useful to save memory when not all edges have values.

>>> adj = nx.adjacency_matrix(G, weight='Node_Attrib')
>>> adj[0,1]    # (mom, dad) edge as the node ordering is taken from `G.nodes`
0.2
>>> array = adj.todense()   # if for some reason you need the whole matrix

As the documentation of that function states, you can also create a pure Python equivalent of a sparse matrix with a dict-of-dicts. But if you want to perform some analysis, I suspect the array option from above will be more suitable for you.

>>> adj = nx.convert.to_dict_of_dicts(G)
>>> adj['mom']['Lucy']['Node_Attrib']
0.2

This would require a bit of a clean-up so that adj[node1][node2] gives you the edge value straight up. You'd also need to actually use it with adj.get(node1, {}).get(node2, 0.) to not run into any KeyError.