python optimization networkx graph-theory

Efficient Node Grouping Technique for Centrality Calculations in Graphs

I'm working on optimizing some code that involves frequently accessing values associated with the centrality of each node in a graph. Since nodes with the same neighbors have the same value in my calculations, I thought it might be beneficial to group nodes with identical neighbor sets together. This way, I can reduce the number of calculations by only performing them once per group instead of for every individual node. However, I'm not sure how to efficiently implement this grouping method. Any suggestions or advice on the best way to approach this problem would be greatly appreciated.

This is the implementation i did but it's pretty basic. As you can see the fact that I have 2 loop the code is about O(n^2) which is quite a lot when used on big graph. As @ravenspoint and @PaulBrodersen correctly pointed out, my loop wasnt working correctly so i changed the implementation.

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

Here is a MRE if you want to test the code

import networkx as nx

def group_similar_nodes(G):
    node_groups = []
    lonely_node = []
    nbNodes = 0
    neighbors = {}
    for node in G.nodes():
        neighbors[node] = set(G.neighbors(node))
        lonely_node.append(node)
        nbNodes+=1
    for node in range(nbNodes):
        if node in lonely_node:
            group = [node]
            lonely_node.remove(node)
            for node2 in range(node+1,nbNodes):
                if(node !=node2) and (node2 in lonely_node):
                    diff = neighbors[node] ^ neighbors[node2]
                    diff.discard(node)
                    diff.discard(node2)
                    if(not diff):
                        group.append(node2)
                        lonely_node.remove(node2)
            node_groups.append(group)
    return node_groups

i = 4 #number of communitues
n = 5 # number of nodes per communitues
p_in = 1 # probability of an edge within a community
p_out = 0.1 # probability of an edge between communities

G = nx.planted_partition_graph(i, n, p_in, p_out, seed=42) 
print(group_similar_nodes(G))
#expected result
# [[0, 1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13, 14], [15], [16], [17], [18], [19]]

Solution

After finally understanding the question correctly, the answer is pretty straightforward and can be accomplished in linear time once the nx.Graph object is created. The nx.Graph object at its core represents the graph as its adjacency list i.e. a mapping of nodes to their (unordered list of) neighbours. We want a mapping of neighbours to nodes. All we hence need to do is normalize the representation of neighbour sets and then invert the mapping:

import networkx as nx

def invert_dict(d):
    # https://stackoverflow.com/a/485368/2912349
    inverted = dict()
    for k, v in d.items():
        inverted[v] = inverted.get(v, []) + [k]
    return inverted

if __name__ == '__main__':

    G = nx.Graph([(0, 1), (1, 2), (2, 3), (3, 0)]) # square graph

    d = dict()
    for node in G:
        d[node] = tuple(sorted(G.neighbors(node)))

    inverted = invert_dict(d)
    print(list(inverted.values()))
    # [[0, 2], [1, 3]]