Search code examples
pythonpandasnetworkx

Network and attributes (nodes - neighbors)


In this example, a network from a dataframe df has 4 nodes and 5 links:

N      T      P
1      2      B
1      3      B
2      1      A
2      3      A
2      4      A
3      1      B
3      2      B
3      4      B
4      2      A
4      3      A

The nodes have the following properties (summary):

N       P
1       B
2       A
3       B
4       A

For each node, I'd need to calculate the proportion of neighbors of the same type. Building the network by joining the two dataframes, if I select nodes with P=A, Iàd have:

N    F1
1   1/2 # node 1 has only one node having P=A
2   1/3 # node 2 has 1 node with P=A and two with P=B
3   1/3
4   1/2

Once I have this list, I'd need for each node either the list of mean of the values found above of the node's P=A neighbors.

This mean

N   F2
1   1/3
2   1/2
3   1/2, 1/3
4   1/3

Code for building the network

G = nx.from_pandas_edgelist(df, source='N', target='T')
colors = []
for node in G:
    if node in df["P"].values:
        colors.append("lightblue")
    else: colors.append("lightgreen")

nx.draw(G, 
        node_color=colors,
        with_labels=True)

I don't know how to check node's neighbors and calculate the probability that they have the same P, so how to get the expected outputs F1 and F2. I think the problem is on that.


Solution

  • Getting F1 is pretty straightforward. For each unique node attribute value, induce a subgraph, count the edges, divide by the overall degree of the node.

    #!/usr/bin/env python
    """
    Compute homophily score by node.
    """
    import numpy as np
    import matplotlib.pyplot as plt
    import networkx as nx
    
    
    if __name__ == '__main__':
    
        edges = [
            (1,      2),
            (1,      3),
            (2,      1),
            (2,      3),
            (2,      4),
            (3,      1),
            (3,      2),
            (3,      4),
            (4,      2),
            (4,      3),
        ]
    
        nodes = [
            (1, {'property'  :    'B'}),
            (2, {'property'  :    'A'}),
            (3, {'property'  :    'B'}),
            (4, {'property'  :    'A'}),
        ]
    
        g = nx.Graph()
        g.add_nodes_from(nodes)
        g.add_edges_from(edges)
    
        neighbours_of_same_type = dict()
        for letter in 'AB':
            nodes_with_property = [node for node, data in g.nodes(data=True) if data['property'] == letter]
            h = g.subgraph(nodes_with_property)
            neighbours_of_same_type.update(h.degree)
    
        degree = dict(g.degree)
        output = dict()
        for node, _ in nodes:
            output[node] = neighbours_of_same_type[node] / degree[node]
    
        print(output)
        # {1: 0.5, 2: 0.3333333333333333, 3: 0.3333333333333333, 4: 0.5}
    

    Once I have this list, I'd need for each node either the list of mean of the values found above of the node's P=A neighbors.

    Your definition of F2 makes little sense to me but I suspect that it can be readily computed from F1?