In this example, a network from a dataframe df has 4 nodes and 5 links:
N T P
1 2 B
1 3 B
2 1 A
2 3 A
2 4 A
3 1 B
3 2 B
3 4 B
4 2 A
4 3 A
The nodes have the following properties (summary):
N P
1 B
2 A
3 B
4 A
For each node, I'd need to calculate the proportion of neighbors of the same type.
Building the network by joining the two dataframes, if I select nodes with P=A
, Iàd have:
N F1
1 1/2 # node 1 has only one node having P=A
2 1/3 # node 2 has 1 node with P=A and two with P=B
3 1/3
4 1/2
Once I have this list, I'd need for each node either the list of mean of the values found above of the node's P=A
neighbors.
This mean
N F2
1 1/3
2 1/2
3 1/2, 1/3
4 1/3
Code for building the network
G = nx.from_pandas_edgelist(df, source='N', target='T')
colors = []
for node in G:
if node in df["P"].values:
colors.append("lightblue")
else: colors.append("lightgreen")
nx.draw(G,
node_color=colors,
with_labels=True)
I don't know how to check node's neighbors and calculate the probability that they have the same P, so how to get the expected outputs F1 and F2. I think the problem is on that.
Getting F1
is pretty straightforward. For each unique node attribute value, induce a subgraph, count the edges, divide by the overall degree of the node.
#!/usr/bin/env python
"""
Compute homophily score by node.
"""
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
if __name__ == '__main__':
edges = [
(1, 2),
(1, 3),
(2, 1),
(2, 3),
(2, 4),
(3, 1),
(3, 2),
(3, 4),
(4, 2),
(4, 3),
]
nodes = [
(1, {'property' : 'B'}),
(2, {'property' : 'A'}),
(3, {'property' : 'B'}),
(4, {'property' : 'A'}),
]
g = nx.Graph()
g.add_nodes_from(nodes)
g.add_edges_from(edges)
neighbours_of_same_type = dict()
for letter in 'AB':
nodes_with_property = [node for node, data in g.nodes(data=True) if data['property'] == letter]
h = g.subgraph(nodes_with_property)
neighbours_of_same_type.update(h.degree)
degree = dict(g.degree)
output = dict()
for node, _ in nodes:
output[node] = neighbours_of_same_type[node] / degree[node]
print(output)
# {1: 0.5, 2: 0.3333333333333333, 3: 0.3333333333333333, 4: 0.5}
Once I have this list, I'd need for each node either the list of mean of the values found above of the node's P=A neighbors.
Your definition of F2
makes little sense to me but I suspect that it can be readily computed from F1?